The supported sequencing libraries are those generated by the BD Rhapsody assay kits, including: Whole Transcriptome mRNA, Targeted mRNA, AbSeq Antibody-Oligonucleotides, Single-Cell Multiplexing, TCR/BCR, and ATAC-Seq
The CWL pipeline file is obtained by cloning ‘https://bitbucket.org/CRSwDev/cwl’ and removing all objects with class ‘DockerRequirement’ from the YAML.
Example commands
You can run the pipeline using nextflow run.
View help
You can use --help as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \-r 2.1.0 -latest\-main-script target/nextflow/mapping/bd_rhapsody/main.nf \--help
Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.
Argument groups
Inputs
Name
Description
Attributes
--reads
Reads (optional) - Path to your FASTQ.GZ formatted read files from libraries that may include: - WTA mRNA - Targeted mRNA - AbSeq - Sample Multiplexing - VDJ You may specify as many R1/R2 read pairs as you want.
List of file, example: "WTALibrary_S1_L001_R1_001.fastq.gz", "WTALibrary_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--reads_atac
Path to your FASTQ.GZ formatted read files from ATAC-Seq libraries. You may specify as many R1/R2/I2 files as you want.
List of file, example: "ATACLibrary_S2_L001_R1_001.fastq.gz", "ATACLibrary_S2_L001_R2_001.fastq.gz", "ATACLibrary_S2_L001_I2_001.fastq.gz", multiple_sep: ";"
References
Assay type will be inferred from the provided reference(s). Do not provide both reference_archive and targeted_reference at the same time.
Valid reference input combinations: - reference_archive: WTA only - reference_archive & abseq_reference: WTA + AbSeq - reference_archive & supplemental_reference: WTA + extra transgenes - reference_archive & abseq_reference & supplemental_reference: WTA + AbSeq + extra transgenes - reference_archive: WTA + ATAC or ATAC only - reference_archive & supplemental_reference: WTA + ATAC + extra transgenes - targeted_reference: Targeted only - targeted_reference & abseq_reference: Targeted + AbSeq - abseq_reference: AbSeq only
The reference_archive can be generated with the bd_rhapsody_make_reference component. Alternatively, BD also provides standard references which can be downloaded from these locations:
Path to Rhapsody WTA Reference in the tar.gz format. Structure of the reference archive: - BD_Rhapsody_Reference_Files/: top level folder - star_index/: sub-folder containing STAR index, that is files created with STAR --runMode genomeGenerate - GTF for gene-transcript-annotation e.g. “gencode.v43.primary_assembly.annotation.gtf”
file, example: "RhapRef_Human_WTA_2023-02.tar.gz"
--targeted_reference
Path to the targeted reference file in FASTA format.
List of file, example: "BD_Rhapsody_Immune_Response_Panel_Hs.fasta", multiple_sep: ";"
--abseq_reference
Path to the AbSeq reference file in FASTA format. Only needed if BD AbSeq Ab-Oligos are used.
List of file, example: "AbSeq_reference.fasta", multiple_sep: ";"
--supplemental_reference
Path to the supplemental reference file in FASTA format. Only needed if there are additional transgene sequences to be aligned against in a WTA assay experiment.
List of file, example: "supplemental_reference.fasta", multiple_sep: ";"
Outputs
Outputs for all pipeline runs
Name
Description
Attributes
--output_dir
The unprocessed output directory containing all the outputs from the pipeline.
file, required, example: "output_dir"
--output_seurat
Single-cell analysis tool inputs. Seurat (.rds) input file containing RSEC molecules data table and all cell annotation metadata.
file, example: "output_seurat.rds"
--output_mudata
file, example: "output_mudata.h5mu"
--metrics_summary
Metrics Summary. Report containing sequencing, molecules, and cell metrics.
file, example: "metrics_summary.csv"
--pipeline_report
Pipeline Report. Summary report containing the results from the sequencing analysis pipeline run.
file, example: "pipeline_report.html"
--rsec_mols_per_cell
Molecules per bioproduct per cell bassed on RSEC
file, example: "RSEC_MolsPerCell_MEX.zip"
--dbec_mols_per_cell
Molecules per bioproduct per cell bassed on DBEC. DBEC data table is only output if the experiment includes targeted mRNA or AbSeq bioproducts.
file, example: "DBEC_MolsPerCell_MEX.zip"
--rsec_mols_per_cell_unfiltered
Unfiltered tables containing all cell labels with 10 reads.
Alignment file of R2 with associated R1 annotations for Bioproduct.
file, example: "BioProduct.bam"
--bam_index
Index file for the alignment file.
file, example: "BioProduct.bam.bai"
--bioproduct_stats
Bioproduct Stats. Metrics from RSEC and DBEC Unique Molecular Identifier adjustment algorithms on a per-bioproduct basis.
file, example: "Bioproduct_Stats.csv"
--dimred_tsne
t-SNE dimensionality reduction coordinates per cell index
file, example: "tSNE_coordinates.csv"
--dimred_umap
UMAP dimensionality reduction coordinates per cell index
file, example: "UMAP_coordinates.csv"
--immune_cell_classification
Immune Cell Classification. Cell type classification based on the expression of immune cell markers.
file, example: "Immune_Cell_Classification.csv"
Multiplex outputs
Outputs when multiplex option is selected
Name
Description
Attributes
--sample_tag_metrics
Sample Tag Metrics. Metrics from the sample determination algorithm.
file, example: "Sample_Tag_Metrics.csv"
--sample_tag_calls
Sample Tag Calls. Assigned Sample Tag for each putative cell
file, example: "Sample_Tag_Calls.csv"
--sample_tag_counts
Sample Tag Counts. Separate data tables and metric summary for cells assigned to each sample tag. Note: For putative cells that could not be assigned a specific Sample Tag, a Multiplet_and_Undetermined.zip file is also output.
List of file, example: "Sample_Tag1.zip", multiple_sep: ";"
--sample_tag_counts_unassigned
Sample Tag Counts Unassigned. Data table and metric summary for cells that could not be assigned a specific Sample Tag.
file, example: "Multiplet_and_Undetermined.zip"
VDJ Outputs
Outputs when VDJ option selected
Name
Description
Attributes
--vdj_metrics
VDJ Metrics. Overall metrics from the VDJ analysis.
file, example: "VDJ_Metrics.csv"
--vdj_per_cell
VDJ Per Cell. Cell specific read and molecule counts, VDJ gene segments, CDR3 sequences, paired chains, and cell type.
file, example: "VDJ_perCell.csv"
--vdj_per_cell_uncorrected
VDJ Per Cell Uncorrected. Cell specific read and molecule counts, VDJ gene segments, CDR3 sequences, paired chains, and cell type.
file, example: "VDJ_perCell_uncorrected.csv"
--vdj_dominant_contigs
VDJ Dominant Contigs. Dominant contig for each cell label chain type combination (putative cells only).
file, example: "VDJ_Dominant_Contigs_AIRR.csv"
--vdj_unfiltered_contigs
VDJ Unfiltered Contigs. All contigs that were assembled and annotated successfully (all cells).
file, example: "VDJ_Unfiltered_Contigs_AIRR.csv"
ATAC-Seq outputs
Outputs when ATAC-Seq option selected
Name
Description
Attributes
--atac_metrics
ATAC Metrics. Overall metrics from the ATAC-Seq analysis.
file, example: "ATAC_Metrics.csv"
--atac_metrics_json
ATAC Metrics JSON. Overall metrics from the ATAC-Seq analysis in JSON format.
file, example: "ATAC_Metrics.json"
--atac_fragments
ATAC Fragments. Chromosomal location, cell index, and read support for each fragment detected
file, example: "ATAC_Fragments.bed.gz"
--atac_fragments_index
Index of ATAC Fragments.
file, example: "ATAC_Fragments.bed.gz.tbi"
--atac_transposase_sites
ATAC Transposase Sites. Chromosomal location, cell index, and read support for each transposase site detected
Specify the dataset to be used for putative cell calling: mRNA, AbSeq, ATAC, mRNA_and_ATAC For putative cell calling using an AbSeq dataset, please provide an AbSeq_Reference fasta file above. For putative cell calling using an ATAC dataset, please provide a WTA+ATAC-Seq Reference_Archive file above. The default data for putative cell calling, will be determined the following way: - If mRNA Reads and ATAC Reads exist: mRNA_and_ATAC - If only ATAC Reads exist: ATAC - Otherwise: mRNA
string, example: "mRNA"
--cell_calling_bioproduct_algorithm
Specify the bioproduct algorithm to be used for putative cell calling: Basic or Refined By default, the Basic algorithm will be used for putative cell calling.
string, example: "Basic"
--cell_calling_atac_algorithm
Specify the ATAC-seq algorithm to be used for putative cell calling: Basic or Refined By default, the Basic algorithm will be used for putative cell calling.
string, example: "Basic"
--exact_cell_count
Set a specific number of cells as putative, based on those with the highest error-corrected read count
integer, example: 10000
--expected_cell_count
Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded into the Rhapsody cartridge. If there are multiple inflection points on the second derivative cumulative curve, this will ensure the one selected is near the expected.
integer, example: 20000
Intronic Reads Settings
Name
Description
Attributes
--exclude_intronic_reads
By default, the flag is false, and reads aligned to exons and introns are considered and represented in molecule counts. When the flag is set to true, intronic reads will be excluded. The value can be true or false.
boolean, example: FALSE
Multiplex Settings
Name
Description
Attributes
--sample_tags_version
Specify the version of the Sample Tags used in the run: * If Sample Tag Multiplexing was done, specify the appropriate version: human, mouse, flex, nuclei_includes_mrna, nuclei_atac_only * If this is an SMK + Nuclei mRNA run or an SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq) run (and not an SMK + ATAC-Seq only run), choose the “nuclei_includes_mrna” option. * If this is an SMK + ATAC-Seq only run (and not SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq)), choose the “nuclei_atac_only” option.
string, example: "human"
--tag_names
Specify the tag number followed by ‘-’ and the desired sample name to appear in Sample_Tag_Metrics.csv Do not use the special characters.
List of string, example: "4-mySample", "9-myOtherSample", "6-alsoThisSample", multiple_sep: ";"
VDJ arguments
Name
Description
Attributes
--vdj_version
If VDJ was done, specify the appropriate option: human, mouse, humanBCR, humanTCR, mouseBCR, mouseTCR
string, example: "human"
ATAC options
Name
Description
Attributes
--predefined_atac_peaks
An optional BED file containing pre-established chromatin accessibility peak regions for generating the ATAC cell-by-peak matrix.
file, example: "predefined_peaks.bed"
Additional options
Name
Description
Attributes
--run_name
Specify a run name to use as the output file base name. Use only letters, numbers, or hyphens. Do not use special characters or spaces.
string, default: "sample"
--generate_bam
Specify whether to create the BAM file output
boolean, default: FALSE
--long_reads
Use STARlong (default: undefined - i.e. autodetects based on read lengths) - Specify if the STARlong aligner should be used instead of STAR. Set to true if the reads are longer than 650bp.
boolean
Advanced options
NOTE: Only change these if you are really sure about what you are doing
Name
Description
Attributes
--custom_star_params
Modify STAR alignment parameters - Set this parameter to fully override default STAR mapping parameters used in the pipeline. For reference this is the default that is used: Short Reads: --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMultimapScoreRange 0 --clip3pAdapterSeq AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA --seedSearchStartLmax 50 --outFilterMatchNmin 25 --limitOutSJcollapsed 2000000 Long Reads: Same as Short Reads + --seedPerReadNmax 10000 This applies to fastqs provided in the Reads user input Do NOT set any non-mapping related params like --genomeDir, --outSAMtype, --outSAMunmapped, --readFilesIn, --runThreadN, etc. We use STAR version 2.7.10b
Modify bwa-mem2 alignment parameters - Set this parameter to fully override bwa-mem2 mapping parameters used in the pipeline The pipeline does not specify any custom mapping params to bwa-mem2 so program default values are used This applies to fastqs provided in the Reads_ATAC user input Do NOT set any non-mapping related params like -C, -t, etc. We use bwa-mem2 version 2.2.1
string, example: "-k 16 -w 200 -r"
CWL-runner arguments
Name
Description
Attributes
--parallel
Run jobs in parallel.
boolean, default: TRUE
--timestamps
Add timestamps to the errors, warnings, and notifications.
boolean_true
Undocumented arguments
Name
Description
Attributes
--abseq_umi
integer
--target_analysis
boolean
--vdj_jgene_evalue
e-value threshold for J gene. The e-value threshold for J gene call by IgBlast/PyIR, default is set as 0.001
double
--vdj_vgene_evalue
e-value threshold for V gene. The e-value threshold for V gene call by IgBlast/PyIR, default is set as 0.001