Cellsnp
cellSNP aims to pileup the expressed alleles in single-cell or bulk RNA-seq data.
Info
ID: cellsnp
Namespace: genetic_demux
Links
It can be directly used for donor deconvolution in multiplexed single-cell RNA-seq data, particularly with vireo.
Example commands
You can run the pipeline using nextflow run
.
View help
You can use --help
as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-main-script target/nextflow/genetic_demux/cellsnp/main.nf \
--help
Run command
Example of params.yaml
# Input
# sam_file: "path/to/file"
# sam_index_file: "path/to/file"
# sam_fileList: "path/to/file"
# regions_vcf: "path/to/file"
# targets_vcf: "path/to/file"
# barcode_file: "path/to/file"
# sample_list: "path/to/file"
# sample_ids: "foo"
genotype: false
gzip: false
print_skip_snps: false
# chrom: "foo"
cell_tag: "CB"
umi_tag: "Auto"
min_count: 20
min_maf: 0.0
doublet_gl: false
# incl_flag: "foo"
# excl_flag: "foo"
count_orphan: false
min_mapq: 20
min_len: 30
# Output
# output: "$id.$key.output.output"
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-profile docker \
-main-script target/nextflow/genetic_demux/cellsnp/main.nf \
-params-file params.yaml
Note
Replace -profile docker
with -profile podman
or -profile singularity
depending on the desired backend.
Argument groups
Input
Name | Description | Attributes |
---|---|---|
--sam_file |
Indexed sam/bam file(s), comma separated multiple samples. Mode 1a & 2a: one sam/bam file with single cell. Mode 1b & 2b: one or multiple bulk sam/bam files, no barcodes needed, but sample ids and regionsVCF. | file |
--sam_index_file |
Input SAM/BAM Index file, problem with samFileList. | file |
--sam_fileList |
A list file containing bam files, each per line, for Mode 1b & 2b. | file |
--regions_vcf |
A vcf file listing all candidate SNPs, for fetch each variants. If None, pileup the genome. Needed for bulk samples. | file |
--targets_vcf |
Similar as –regions_vcf, but the next position is accessed by streaming rather than indexing/jumping (like -T in samtools/bcftools mpileup). | file |
--barcode_file |
A plain file listing all effective cell barcode. | file |
--sample_list |
A list file containing sample IDs, each per line. | file |
--sample_ids |
Comma separated sample ids. | string |
--genotype |
If use, do genotyping in addition to counting. | boolean_true |
--gzip |
If use, the output files will be zipped into BGZF format. | boolean_true |
--print_skip_snps |
If use, the SNPs skipped when loading VCF will be printed. | boolean_true |
--chrom |
The chromosomes to use in integer format 1-22, comma separated | string |
--cell_tag |
Tag for cell barcodes, turn off with None. | string , default: "CB" |
--umi_tag |
Tag for UMI: UR, Auto, None. For Auto mode, use UR if barcodes is inputted, otherwise use None. None mode means no UMI but read counts. | string , default: "Auto" |
--min_count |
Minimum aggragated count. | integer , default: 20 |
--min_maf |
Minimum minor allele frequency. | double , default: 0 |
--doublet_gl |
If use, keep doublet GT likelihood, i.e., GT=0.5 and GT=1.5. | boolean_true |
--incl_flag |
Required flags: skip reads with all mask bits unset. | string |
--excl_flag |
Filter flags: skip reads with any mask bits set [UNMAP,SECONDARY,QCFAIL (when use UMI) or UNMAP,SECONDARY,QCFAIL,DUP (otherwise)] | string |
--count_orphan |
If use, do not skip anomalous read pairs. | boolean_true |
--min_mapq |
Minimum MAPQ for read filtering. | integer , default: 20 |
--min_len |
Minimum mapped length for read filtering. | integer , default: 30 |
Output
Name | Description | Attributes |
---|---|---|
--output |
Output directory for VCF and sparse matrices. | file , example: "cellsnp_out" |