Cellranger multi

Align fastq files using Cell Ranger multi.

Info

ID: cellranger_multi
Namespace: mapping

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -main-script target/nextflow/mapping/cellranger_multi/main.nf \
  --help

Run command

Example of params.yaml
# Input files
# input: ["mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"]

# Feature type-specific input files
# gex_input: ["mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"]
# abc_input: ["mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"]
# cgc_input: ["mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"]
# mux_input: ["mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"]
# vdj_input: ["mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"]
# vdj_t_input: ["mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"]
# vdj_t_gd_input: ["mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"]
# vdj_b_input: ["mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"]
# agc_input: ["mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"]

# Library arguments
# library_id: ["mysample1"]
# library_type: ["Gene Expression"]
# library_subsample: ["0.5"]
# library_lanes: ["1-4"]
# library_chemistry: "foo"

# Sample parameters
# sample_ids: ["foo"]
# sample_description: ["foo"]
# sample_expect_cells: [3000]
# sample_force_cells: [3000]

# Feature Barcode library specific arguments
# feature_reference: "feature_reference.csv"
# feature_r1_length: 123
# feature_r2_length: 123
# min_crispr_umi: 123

# Gene expression arguments
gex_reference: # please fill in - example: "reference_genome.tar.gz"
gex_secondary_analysis: false
gex_generate_bam: false
# gex_expect_cells: 3000
# gex_force_cells: 3000
gex_include_introns: true
# gex_r1_length: 123
# gex_r2_length: 123
gex_chemistry: "auto"

# VDJ related parameters
# vdj_reference: "reference_vdj.tar.gz"
# vdj_inner_enrichment_primers: "enrichment_primers.txt"
# vdj_r1_length: 123
# vdj_r2_length: 123

# Cell multiplexing parameters
# cell_multiplex_oligo_ids: ["foo"]
# min_assignment_confidence: 123.0
# cmo_set: "path/to/file"
# barcode_sample_assignment: "path/to/file"

# Fixed RNA profiling paramaters
# probe_set: "path/to/file"
# filter_probes: true
# probe_barcode_ids: ["foo"]

# Antigen Capture (BEAM) libary arguments
# control_id: ["foo"]
# mhc_allele: ["foo"]

# General arguments
check_library_compatibility: true

# Outputs
# output: "$id.$key.output"

# Executor arguments
dryrun: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments
nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -profile docker \
  -main-script target/nextflow/mapping/cellranger_multi/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input files

Name Description Attributes
--input The FASTQ files to be analyzed. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: [Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz List of file, example: "mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz", multiple_sep: ";"

Feature type-specific input files

Helper functionality to allow feature type-specific input files, without the need to specify library_type or library_id. The library_id will be inferred from the input paths.

Name Description Attributes
--gex_input The FASTQ files to be analyzed for Gene Expression. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: [Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz List of file, example: "mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--abc_input The FASTQ files to be analyzed for Antibody Capture. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: [Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz List of file, example: "mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--cgc_input The FASTQ files to be analyzed for CRISPR Guide Capture. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: [Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz List of file, example: "mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--mux_input The FASTQ files to be analyzed for Multiplexing Capture. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: [Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz List of file, example: "mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--vdj_input The FASTQ files to be analyzed for VDJ. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: [Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz List of file, example: "mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--vdj_t_input The FASTQ files to be analyzed for VDJ-T. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: [Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz List of file, example: "mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--vdj_t_gd_input The FASTQ files to be analyzed for VDJ-T-GD. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: [Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz List of file, example: "mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--vdj_b_input The FASTQ files to be analyzed for VDJ-B. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: [Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz List of file, example: "mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--agc_input The FASTQ files to be analyzed for Antigen Capture. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: [Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz List of file, example: "mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz", multiple_sep: ";"

Library arguments

Name Description Attributes
--library_id The Illumina sample name to analyze. This must exactly match the ’Sample Name’part of the FASTQ files specified in the --input argument. List of string, example: "mysample1", multiple_sep: ";"
--library_type The underlying feature type of the library. List of string, example: "Gene Expression", multiple_sep: ";"
--library_subsample The rate at which reads from the provided FASTQ files are sampled. Must be strictly greater than 0 and less than or equal to 1. List of string, example: "0.5", multiple_sep: ";"
--library_lanes Lanes associated with this sample. Defaults to using all lanes. List of string, example: "1-4", multiple_sep: ";"
--library_chemistry Only applicable to FRP. Library-specific assay configuration. By default, the assay configuration is detected automatically. Typically, users will not need to specify a chemistry. string

Sample parameters

Name Description Attributes
--sample_ids A name to identify a multiplexed sample. Must be alphanumeric with hyphens and/or underscores, and less than 64 characters. Required for Cell Multiplexing libraries. List of string, multiple_sep: ";"
--sample_description A description for the sample. List of string, multiple_sep: ";"
--sample_expect_cells Expected number of recovered cells, used as input to cell calling algorithm. List of integer, example: 3000, multiple_sep: ";"
--sample_force_cells Force pipeline to use this number of cells, bypassing cell detection. List of integer, example: 3000, multiple_sep: ";"

Feature Barcode library specific arguments

Name Description Attributes
--feature_reference Path to the Feature reference CSV file, declaring Feature Barcode constructs and associated barcodes. Required only for Antibody Capture or CRISPR Guide Capture libraries. See https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis#feature-ref for more information.” file, example: "feature_reference.csv"
--feature_r1_length Limit the length of the input Read 1 sequence of V(D)J libraries to the first N bases, where N is the user-supplied value. Note that the length includes the Barcode and UMI sequences so do not set this below 26. integer
--feature_r2_length Limit the length of the input Read 2 sequence of V(D)J libraries to the first N bases, where N is a user-supplied value. Trimming occurs before sequencing metrics are computed and therefore, limiting the length of Read 2 may affect Q30 scores. integer
--min_crispr_umi Set the minimum number of CRISPR guide RNA UMIs required for protospacer detection. If a lower or higher sensitivity is desired for detection, this value can be customized according to specific experimental needs. Applicable only to datasets that include a CRISPR Guide Capture library. integer

Gene expression arguments

Arguments relevant to the analysis of gene expression data.

Name Description Attributes
--gex_reference Genome refence index built by Cell Ranger mkref. file, required, example: "reference_genome.tar.gz"
--gex_secondary_analysis Whether or not to run the secondary analysis e.g. clustering. boolean, default: FALSE
--gex_generate_bam Whether to generate a BAM file. boolean, default: FALSE
--gex_expect_cells Expected number of recovered cells, used as input to cell calling algorithm. integer, example: 3000
--gex_force_cells Force pipeline to use this number of cells, bypassing cell detection. integer, example: 3000
--gex_include_introns Whether or not to include intronic reads in counts. This option does not apply to Fixed RNA Profiling analysis. boolean, default: TRUE
--gex_r1_length Limit the length of the input Read 1 sequence of V(D)J libraries to the first N bases, where N is the user-supplied value. Note that the length includes the Barcode and UMI sequences so do not set this below 26. integer
--gex_r2_length Limit the length of the input Read 2 sequence of V(D)J libraries to the first N bases, where N is a user-supplied value. Trimming occurs before sequencing metrics are computed and therefore, limiting the length of Read 2 may affect Q30 scores. integer
--gex_chemistry Assay configuration. Either specify a single value which will be applied to all libraries, or a number of values that is equal to the number of libararies. The latter is only applicable to only applicable to Fixed RNA Profiling. - auto: Chemistry autodetection (default) - threeprime: Single Cell 3’ - SC3Pv1, SC3Pv2, SC3Pv3, SC3Pv4: Single Cell 3’ v1, v2, v3, or v4 - SC3Pv3HT: Single Cell 3’ v3.1 HT - SC-FB: Single Cell Antibody-only 3’ v2 or 5’ - fiveprime: Single Cell 5’ - SC5P-PE: Paired-end Single Cell 5’ - SC5P-R2: R2-only Single Cell 5’ - SC5P-R2-v3: R2-only Single Cell 5’ v3 - SCP5-PE-v3: Single Cell 5’ paired-end v3 (GEM-X) - SC5PHT : Single Cell 5’ v2 HT - SFRP: Fixed RNA Profiling (Singleplex) - MFRP: Fixed RNA Profiling (Multiplex, Probe Barcode on R2) - MFRP-R1: Fixed RNA Profiling (Multiplex, Probe Barcode on R1) - MFRP-RNA: Fixed RNA Profiling (Multiplex, RNA, Probe Barcode on R2) - MFRP-Ab: Fixed RNA Profiling (Multiplex, Antibody, Probe Barcode at R2:69) - MFRP-Ab-R2pos50: Fixed RNA Profiling (Multiplex, Antibody, Probe Barcode at R2:50) - MFRP-RNA-R1: Fixed RNA Profiling (Multiplex, RNA, Probe Barcode on R1) - MFRP-Ab-R1: Fixed RNA Profiling (Multiplex, Antibody, Probe Barcode on R1) - ARC-v1 for analyzing the Gene Expression portion of Multiome data. If Cell Ranger auto-detects ARC-v1 chemistry, an error is triggered. See https://kb.10xgenomics.com/hc/en-us/articles/115003764132-How-does-Cell-Ranger-auto-detect-chemistry- for more information. string, default: "auto"

Cell multiplexing parameters

Name Description Attributes
--cell_multiplex_oligo_ids The Cell Multiplexing oligo IDs used to multiplex this sample. If multiple CMOs were used for a sample, separate IDs with a pipe (e.g., CMO301|CMO302). Required for Cell Multiplexing libraries. List of string, multiple_sep: ";"
--min_assignment_confidence The minimum estimated likelihood to call a sample as tagged with a Cell Multiplexing Oligo (CMO) instead of “Unassigned”. Users may wish to tolerate a higher rate of mis-assignment in order to obtain more singlets to include in their analysis, or a lower rate of mis-assignment at the cost of obtaining fewer singlets. double
--cmo_set Path to a custom CMO set CSV file, declaring CMO constructs and associated barcodes. If the default CMO reference IDs that are built into the Cell Ranger software are required, this option does not need to be used. file
--barcode_sample_assignment Path to a barcode-sample assignment CSV file that specifies the barcodes that belong to each sample. file

Fixed RNA profiling paramaters

Name Description Attributes
--probe_set A probe set reference CSV file. It specifies the sequences used as a reference for probe alignment and the gene ID associated with each probe. It must include 4 columns (probe file format 1.0.0): gene_id,probe_seq,probe_id,included,region and an optional 5th column (probe file format 1.0.1). - gene_id: The Ensembl gene identifier targeted by the probe. - probe_seq: The nucleotide sequence of the probe, which is complementary to the transcript sequence. - probe_id: The probe identifier, whose format is described in Probe identifiers. - included: A TRUE or FALSE flag specifying whether the probe is included in the filtered counts matrix output or excluded by the probe filter. See filter-probes option of cellranger multi. All probes of a gene must be marked TRUE in the included column for that gene to be included. - region: Present only in v1.0.1 probe set reference CSV. The gene boundary targeted by the probe. Accepted values are spliced or unspliced. The file also contains a number of required metadata fields in the header in the format #key=value: - panel_name: The name of the probe set. - panel_type: Always predesigned for predesigned probe sets. - reference_genome: The reference genome build used for probe design. - reference_version: The version of the Cell Ranger reference transcriptome used for probe design. - probe_set_file_format: The version of the probe set file format specification that this file conforms to. file
--filter_probes If ‘false’, include all non-deprecated probes listed in the probe set reference CSV file. If ‘true’ or not set, probes that are predicted to have off-target activity to homologous genes are excluded from analysis. Not filtering will result in UMI counts from all non-deprecated probes, including those with predicted off-target activity, to be used in the analysis. Probes whose ID is prefixed with DEPRECATED are always excluded from the analysis. boolean
--probe_barcode_ids The Fixed RNA Probe Barcode ID used for this sample, and for multiplex GEX + Antibody Capture libraries, the corresponding Antibody Multiplexing Barcode IDs. 10x recommends specifying both barcodes (e.g., BC001+AB001) when an Antibody Capture library is present. The barcode pair order is BC+AB and they are separated with a “+” (no spaces). Alternatively, you can specify the Probe Barcode ID alone and Cell Ranger’s barcode pairing auto-detection algorithm will automatically match to the corresponding Antibody Multiplexing Barcode. List of string, multiple_sep: ";"

Antigen Capture (BEAM) libary arguments

These arguments are recommended if an Antigen Capture (BEAM) library is present. It is needed to calculate the antigen specificity score.

Name Description Attributes
--control_id A user-defined ID for any negative controls used in the T/BCR Antigen Capture assay. Must match id specified in the feature reference CSV. May only include ASCII characters and must not use whitespace, slash, quote, or comma characters. Each ID must be unique and must not collide with a gene identifier from the transcriptome. List of string, multiple_sep: ";"
--mhc_allele The MHC allele for TCR Antigen Capture libraries. Must match mhc_allele name specified in the Feature Reference CSV. List of string, multiple_sep: ";"

General arguments

These arguments are applicable to all library types.

Name Description Attributes
--check_library_compatibility Optional. This option allows users to disable the check that evaluates 10x Barcode overlap between ibraries when multiple libraries are specified (e.g., Gene Expression + Antibody Capture). Setting this option to false will disable the check across all library combinations. We recommend running this check (default), however if the pipeline errors out, users can bypass the check to generate outputs for troubleshooting. boolean, default: TRUE

Outputs

Name Description Attributes
--output The folder to store the alignment results. file, required, example: "/path/to/output"

Executor arguments

Name Description Attributes
--dryrun If true, the output directory will only contain the CWL input files, but the pipeline itself will not be executed. boolean_true

Authors

  • Angela Oliveira Pisco (author)

  • Robrecht Cannoodt (author, maintainer)

  • Dries De Maeyer (author)

  • Weiwei Schultz (contributor)

  • Dorien Roosen (author)