Scsplit

scsplit is a genotype-free demultiplexing methode of pooled single-cell RNA-seq, using a hidden state model for identifying genetically distinct samples within a mixed population.

Info

ID: scsplit
Namespace: genetic_demux

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.2 -latest \
  -main-script target/nextflow/genetic_demux/scsplit/main.nf \
  --help

Run command

Example of params.yaml
# Input
# vcf: "path/to/file"
# bam: "path/to/file"
# bar: "path/to/file"
tag: "CB"
# com: "path/to/file"
# num: 123
sub: 10
ems: 30
# dbl: 123.0
# vcf_known: "path/to/file"
geno: false

# Output
# output: "$id.$key.output.output"
# ref: "foo"
# alt: "foo"
# psc: "foo"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.2 -latest \
  -profile docker \
  -main-script target/nextflow/genetic_demux/scsplit/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input

Name Description Attributes
--vcf VCF from mixed BAM file
--bam mixed sample BAM file
--bar barcodes whitelist file
--tag tag for barcode string, default: "CB"
--com common SNVs file
--num expected number of mixed samples integer
--sub maximum number of subpopulations in autodetect mode integer, default: 10
--ems number of EM repeats to avoid local maximum integer, default: 30
--dbl correction for doublets. There will be no refinement on the results if this optional parameter is not specified or specified percentage is less than doublet rates detected during the run. double
--vcf_known known individual genotypes to limit distinguishing variants to available variants, so that users do not need to redo genotyping on selected variants, otherwise any variants could be selected as distinguishing variants. file
--geno generate sample genotypes based on the split result. boolean_true

Output

Name Description Attributes
--output Output directory file, example: "scSplit_out"
--ref output Ref count matrix string
--alt output Alt count matrix string
--psc generated P(S|C) string

Authors

  • Xichen Wu (author)