Bd rhapsody

BD Rhapsody Sequence Analysis CWL pipeline v2.2.1 This pipeline performs analysis of single-cell multiomic sequence read (FASTQ) data.

Info

ID: bd_rhapsody
Namespace: mapping

The supported sequencing libraries are those generated by the BD Rhapsody assay kits, including: Whole Transcriptome mRNA, Targeted mRNA, AbSeq Antibody-Oligonucleotides, Single-Cell Multiplexing, TCR/BCR, and ATAC-Seq

The CWL pipeline file is obtained by cloning ‘https://bitbucket.org/CRSwDev/cwl’ and removing all objects with class ‘DockerRequirement’ from the YAML.

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -main-script target/nextflow/mapping/bd_rhapsody/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
# reads: ["WTALibrary_S1_L001_R1_001.fastq.gz", "WTALibrary_S1_L001_R2_001.fastq.gz"]
# reads_atac: ["ATACLibrary_S2_L001_R1_001.fastq.gz", "ATACLibrary_S2_L001_R2_001.fastq.gz", "ATACLibrary_S2_L001_I2_001.fastq.gz"]

# References
# reference_archive: "RhapRef_Human_WTA_2023-02.tar.gz"
# targeted_reference: ["BD_Rhapsody_Immune_Response_Panel_Hs.fasta"]
# abseq_reference: ["AbSeq_reference.fasta"]
# supplemental_reference: ["supplemental_reference.fasta"]

# Outputs
# output_dir: "$id.$key.output_dir"
# output_seurat: "$id.$key.output_seurat.rds"
# output_mudata: "$id.$key.output_mudata.h5mu"
# metrics_summary: "$id.$key.metrics_summary.csv"
# pipeline_report: "$id.$key.pipeline_report.html"
# rsec_mols_per_cell: "$id.$key.rsec_mols_per_cell.zip"
# dbec_mols_per_cell: "$id.$key.dbec_mols_per_cell.zip"
# rsec_mols_per_cell_unfiltered: "$id.$key.rsec_mols_per_cell_unfiltered.zip"
# bam: "$id.$key.bam.bam"
# bam_index: "$id.$key.bam_index.bai"
# bioproduct_stats: "$id.$key.bioproduct_stats.csv"
# dimred_tsne: "$id.$key.dimred_tsne.csv"
# dimred_umap: "$id.$key.dimred_umap.csv"
# immune_cell_classification: "$id.$key.immune_cell_classification.csv"

# Multiplex outputs
# sample_tag_metrics: "$id.$key.sample_tag_metrics.csv"
# sample_tag_calls: "$id.$key.sample_tag_calls.csv"
# sample_tag_counts: ["$id.$key.sample_tag_counts_*.zip"]
# sample_tag_counts_unassigned: "$id.$key.sample_tag_counts_unassigned.zip"

# VDJ Outputs
# vdj_metrics: "$id.$key.vdj_metrics.csv"
# vdj_per_cell: "$id.$key.vdj_per_cell.csv"
# vdj_per_cell_uncorrected: "$id.$key.vdj_per_cell_uncorrected.csv"
# vdj_dominant_contigs: "$id.$key.vdj_dominant_contigs.csv"
# vdj_unfiltered_contigs: "$id.$key.vdj_unfiltered_contigs.csv"

# ATAC-Seq outputs
# atac_metrics: "$id.$key.atac_metrics.csv"
# atac_metrics_json: "$id.$key.atac_metrics_json.json"
# atac_fragments: "$id.$key.atac_fragments.gz"
# atac_fragments_index: "$id.$key.atac_fragments_index.tbi"
# atac_transposase_sites: "$id.$key.atac_transposase_sites.gz"
# atac_transposase_sites_index: "$id.$key.atac_transposase_sites_index.tbi"
# atac_peaks: "$id.$key.atac_peaks.gz"
# atac_peaks_index: "$id.$key.atac_peaks_index.tbi"
# atac_peak_annotation: "$id.$key.atac_peak_annotation.gz"
# atac_cell_by_peak: "$id.$key.atac_cell_by_peak.zip"
# atac_cell_by_peak_unfiltered: "$id.$key.atac_cell_by_peak_unfiltered.zip"
# atac_bam: "$id.$key.atac_bam.bam"
# atac_bam_index: "$id.$key.atac_bam_index.bai"

# AbSeq Cell Calling outputs
# protein_aggregates_experimental: "$id.$key.protein_aggregates_experimental.csv"

# Putative Cell Calling Settings
# cell_calling_data: "mRNA"
# cell_calling_bioproduct_algorithm: "Basic"
# cell_calling_atac_algorithm: "Basic"
# exact_cell_count: 10000
# expected_cell_count: 20000

# Intronic Reads Settings
# exclude_intronic_reads: false

# Multiplex Settings
# sample_tags_version: "human"
# tag_names: ["4-mySample", "9-myOtherSample", "6-alsoThisSample"]

# VDJ arguments
# vdj_version: "human"

# ATAC options
# predefined_atac_peaks: "predefined_peaks.bed"

# Additional options
run_name: "sample"
generate_bam: false
# long_reads: true

# Advanced options
# custom_star_params: "--alignIntronMax 6000 --outFilterScoreMinOverLread 0.1 --limitOutSJcollapsed 2000000"
# custom_bwa_mem2_params: "-k 16 -w 200 -r"

# CWL-runner arguments
parallel: true
timestamps: false

# Undocumented arguments
# abseq_umi: 123
# target_analysis: true
# vdj_jgene_evalue: 123.0
# vdj_vgene_evalue: 123.0
# write_filtered_reads: true

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments
nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -profile docker \
  -main-script target/nextflow/mapping/bd_rhapsody/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--reads Reads (optional) - Path to your FASTQ.GZ formatted read files from libraries that may include: - WTA mRNA - Targeted mRNA - AbSeq - Sample Multiplexing - VDJ You may specify as many R1/R2 read pairs as you want. List of file, example: "WTALibrary_S1_L001_R1_001.fastq.gz", "WTALibrary_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--reads_atac Path to your FASTQ.GZ formatted read files from ATAC-Seq libraries. You may specify as many R1/R2/I2 files as you want. List of file, example: "ATACLibrary_S2_L001_R1_001.fastq.gz", "ATACLibrary_S2_L001_R2_001.fastq.gz", "ATACLibrary_S2_L001_I2_001.fastq.gz", multiple_sep: ";"

References

Assay type will be inferred from the provided reference(s). Do not provide both reference_archive and targeted_reference at the same time.

Valid reference input combinations: - reference_archive: WTA only - reference_archive & abseq_reference: WTA + AbSeq - reference_archive & supplemental_reference: WTA + extra transgenes - reference_archive & abseq_reference & supplemental_reference: WTA + AbSeq + extra transgenes - reference_archive: WTA + ATAC or ATAC only - reference_archive & supplemental_reference: WTA + ATAC + extra transgenes - targeted_reference: Targeted only - targeted_reference & abseq_reference: Targeted + AbSeq - abseq_reference: AbSeq only

The reference_archive can be generated with the bd_rhapsody_make_reference component. Alternatively, BD also provides standard references which can be downloaded from these locations:

  • Human: https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/RhapRef_Human_WTA_2023-02.tar.gz
  • Mouse: https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/RhapRef_Mouse_WTA_2023-02.tar.gz
Name Description Attributes
--reference_archive Path to Rhapsody WTA Reference in the tar.gz format. Structure of the reference archive: - BD_Rhapsody_Reference_Files/: top level folder - star_index/: sub-folder containing STAR index, that is files created with STAR --runMode genomeGenerate - GTF for gene-transcript-annotation e.g. “gencode.v43.primary_assembly.annotation.gtf” file, example: "RhapRef_Human_WTA_2023-02.tar.gz"
--targeted_reference Path to the targeted reference file in FASTA format. List of file, example: "BD_Rhapsody_Immune_Response_Panel_Hs.fasta", multiple_sep: ";"
--abseq_reference Path to the AbSeq reference file in FASTA format. Only needed if BD AbSeq Ab-Oligos are used. List of file, example: "AbSeq_reference.fasta", multiple_sep: ";"
--supplemental_reference Path to the supplemental reference file in FASTA format. Only needed if there are additional transgene sequences to be aligned against in a WTA assay experiment. List of file, example: "supplemental_reference.fasta", multiple_sep: ";"

Outputs

Outputs for all pipeline runs

Name Description Attributes
--output_dir The unprocessed output directory containing all the outputs from the pipeline. file, required, example: "output_dir"
--output_seurat Single-cell analysis tool inputs. Seurat (.rds) input file containing RSEC molecules data table and all cell annotation metadata. file, example: "output_seurat.rds"
--output_mudata file, example: "output_mudata.h5mu"
--metrics_summary Metrics Summary. Report containing sequencing, molecules, and cell metrics. file, example: "metrics_summary.csv"
--pipeline_report Pipeline Report. Summary report containing the results from the sequencing analysis pipeline run. file, example: "pipeline_report.html"
--rsec_mols_per_cell Molecules per bioproduct per cell bassed on RSEC file, example: "RSEC_MolsPerCell_MEX.zip"
--dbec_mols_per_cell Molecules per bioproduct per cell bassed on DBEC. DBEC data table is only output if the experiment includes targeted mRNA or AbSeq bioproducts. file, example: "DBEC_MolsPerCell_MEX.zip"
--rsec_mols_per_cell_unfiltered Unfiltered tables containing all cell labels with 10 reads. file, example: "RSEC_MolsPerCell_Unfiltered_MEX.zip"
--bam Alignment file of R2 with associated R1 annotations for Bioproduct. file, example: "BioProduct.bam"
--bam_index Index file for the alignment file. file, example: "BioProduct.bam.bai"
--bioproduct_stats Bioproduct Stats. Metrics from RSEC and DBEC Unique Molecular Identifier adjustment algorithms on a per-bioproduct basis. file, example: "Bioproduct_Stats.csv"
--dimred_tsne t-SNE dimensionality reduction coordinates per cell index file, example: "tSNE_coordinates.csv"
--dimred_umap UMAP dimensionality reduction coordinates per cell index file, example: "UMAP_coordinates.csv"
--immune_cell_classification Immune Cell Classification. Cell type classification based on the expression of immune cell markers. file, example: "Immune_Cell_Classification.csv"

Multiplex outputs

Outputs when multiplex option is selected

Name Description Attributes
--sample_tag_metrics Sample Tag Metrics. Metrics from the sample determination algorithm. file, example: "Sample_Tag_Metrics.csv"
--sample_tag_calls Sample Tag Calls. Assigned Sample Tag for each putative cell file, example: "Sample_Tag_Calls.csv"
--sample_tag_counts Sample Tag Counts. Separate data tables and metric summary for cells assigned to each sample tag. Note: For putative cells that could not be assigned a specific Sample Tag, a Multiplet_and_Undetermined.zip file is also output. List of file, example: "Sample_Tag1.zip", multiple_sep: ";"
--sample_tag_counts_unassigned Sample Tag Counts Unassigned. Data table and metric summary for cells that could not be assigned a specific Sample Tag. file, example: "Multiplet_and_Undetermined.zip"

VDJ Outputs

Outputs when VDJ option selected

Name Description Attributes
--vdj_metrics VDJ Metrics. Overall metrics from the VDJ analysis. file, example: "VDJ_Metrics.csv"
--vdj_per_cell VDJ Per Cell. Cell specific read and molecule counts, VDJ gene segments, CDR3 sequences, paired chains, and cell type. file, example: "VDJ_perCell.csv"
--vdj_per_cell_uncorrected VDJ Per Cell Uncorrected. Cell specific read and molecule counts, VDJ gene segments, CDR3 sequences, paired chains, and cell type. file, example: "VDJ_perCell_uncorrected.csv"
--vdj_dominant_contigs VDJ Dominant Contigs. Dominant contig for each cell label chain type combination (putative cells only). file, example: "VDJ_Dominant_Contigs_AIRR.csv"
--vdj_unfiltered_contigs VDJ Unfiltered Contigs. All contigs that were assembled and annotated successfully (all cells). file, example: "VDJ_Unfiltered_Contigs_AIRR.csv"

ATAC-Seq outputs

Outputs when ATAC-Seq option selected

Name Description Attributes
--atac_metrics ATAC Metrics. Overall metrics from the ATAC-Seq analysis. file, example: "ATAC_Metrics.csv"
--atac_metrics_json ATAC Metrics JSON. Overall metrics from the ATAC-Seq analysis in JSON format. file, example: "ATAC_Metrics.json"
--atac_fragments ATAC Fragments. Chromosomal location, cell index, and read support for each fragment detected file, example: "ATAC_Fragments.bed.gz"
--atac_fragments_index Index of ATAC Fragments. file, example: "ATAC_Fragments.bed.gz.tbi"
--atac_transposase_sites ATAC Transposase Sites. Chromosomal location, cell index, and read support for each transposase site detected file, example: "ATAC_Transposase_Sites.bed.gz"
--atac_transposase_sites_index Index of ATAC Transposase Sites. file, example: "ATAC_Transposase_Sites.bed.gz.tbi"
--atac_peaks ATAC Peaks. Peak regions of transposase activity file, example: "ATAC_Peaks.bed.gz"
--atac_peaks_index Index of ATAC Peaks. file, example: "ATAC_Peaks.bed.gz.tbi"
--atac_peak_annotation ATAC Peak Annotation. Estimated annotation of peak-to-gene connections file, example: "peak_annotation.tsv.gz"
--atac_cell_by_peak ATAC Cell by Peak. Peak regions of transposase activity per cell file, example: "ATAC_Cell_by_Peak_MEX.zip"
--atac_cell_by_peak_unfiltered ATAC Cell by Peak Unfiltered. Unfiltered file containing all cell labels with >=1 transposase sites in peaks. file, example: "ATAC_Cell_by_Peak_Unfiltered_MEX.zip"
--atac_bam ATAC BAM. Alignment file for R1 and R2 with associated I2 annotations for ATAC-Seq. Only output if the BAM generation flag is set to true. file, example: "ATAC.bam"
--atac_bam_index Index of ATAC BAM. file, example: "ATAC.bam.bai"

AbSeq Cell Calling outputs

Outputs when Cell Calling Abseq is selected

Name Description Attributes
--protein_aggregates_experimental Protein Aggregates Experimental file, example: "Protein_Aggregates_Experimental.csv"

Putative Cell Calling Settings

Name Description Attributes
--cell_calling_data Specify the dataset to be used for putative cell calling: mRNA, AbSeq, ATAC, mRNA_and_ATAC For putative cell calling using an AbSeq dataset, please provide an AbSeq_Reference fasta file above. For putative cell calling using an ATAC dataset, please provide a WTA+ATAC-Seq Reference_Archive file above. The default data for putative cell calling, will be determined the following way: - If mRNA Reads and ATAC Reads exist: mRNA_and_ATAC - If only ATAC Reads exist: ATAC - Otherwise: mRNA string, example: "mRNA"
--cell_calling_bioproduct_algorithm Specify the bioproduct algorithm to be used for putative cell calling: Basic or Refined By default, the Basic algorithm will be used for putative cell calling. string, example: "Basic"
--cell_calling_atac_algorithm Specify the ATAC-seq algorithm to be used for putative cell calling: Basic or Refined By default, the Basic algorithm will be used for putative cell calling. string, example: "Basic"
--exact_cell_count Set a specific number of cells as putative, based on those with the highest error-corrected read count integer, example: 10000
--expected_cell_count Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded into the Rhapsody cartridge. If there are multiple inflection points on the second derivative cumulative curve, this will ensure the one selected is near the expected. integer, example: 20000

Intronic Reads Settings

Name Description Attributes
--exclude_intronic_reads By default, the flag is false, and reads aligned to exons and introns are considered and represented in molecule counts. When the flag is set to true, intronic reads will be excluded. The value can be true or false. boolean, example: FALSE

Multiplex Settings

Name Description Attributes
--sample_tags_version Specify the version of the Sample Tags used in the run: * If Sample Tag Multiplexing was done, specify the appropriate version: human, mouse, flex, nuclei_includes_mrna, nuclei_atac_only * If this is an SMK + Nuclei mRNA run or an SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq) run (and not an SMK + ATAC-Seq only run), choose the “nuclei_includes_mrna” option. * If this is an SMK + ATAC-Seq only run (and not SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq)), choose the “nuclei_atac_only” option. string, example: "human"
--tag_names Specify the tag number followed by ‘-’ and the desired sample name to appear in Sample_Tag_Metrics.csv Do not use the special characters. List of string, example: "4-mySample", "9-myOtherSample", "6-alsoThisSample", multiple_sep: ";"

VDJ arguments

Name Description Attributes
--vdj_version If VDJ was done, specify the appropriate option: human, mouse, humanBCR, humanTCR, mouseBCR, mouseTCR string, example: "human"

ATAC options

Name Description Attributes
--predefined_atac_peaks An optional BED file containing pre-established chromatin accessibility peak regions for generating the ATAC cell-by-peak matrix. file, example: "predefined_peaks.bed"

Additional options

Name Description Attributes
--run_name Specify a run name to use as the output file base name. Use only letters, numbers, or hyphens. Do not use special characters or spaces. string, default: "sample"
--generate_bam Specify whether to create the BAM file output boolean, default: FALSE
--long_reads Use STARlong (default: undefined - i.e. autodetects based on read lengths) - Specify if the STARlong aligner should be used instead of STAR. Set to true if the reads are longer than 650bp. boolean

Advanced options

NOTE: Only change these if you are really sure about what you are doing

Name Description Attributes
--custom_star_params Modify STAR alignment parameters - Set this parameter to fully override default STAR mapping parameters used in the pipeline. For reference this is the default that is used: Short Reads: --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMultimapScoreRange 0 --clip3pAdapterSeq AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA --seedSearchStartLmax 50 --outFilterMatchNmin 25 --limitOutSJcollapsed 2000000 Long Reads: Same as Short Reads + --seedPerReadNmax 10000 This applies to fastqs provided in the Reads user input Do NOT set any non-mapping related params like --genomeDir, --outSAMtype, --outSAMunmapped, --readFilesIn, --runThreadN, etc. We use STAR version 2.7.10b string, example: "--alignIntronMax 6000 --outFilterScoreMinOverLread 0.1 --limitOutSJcollapsed 2000000"
--custom_bwa_mem2_params Modify bwa-mem2 alignment parameters - Set this parameter to fully override bwa-mem2 mapping parameters used in the pipeline The pipeline does not specify any custom mapping params to bwa-mem2 so program default values are used This applies to fastqs provided in the Reads_ATAC user input Do NOT set any non-mapping related params like -C, -t, etc. We use bwa-mem2 version 2.2.1 string, example: "-k 16 -w 200 -r"

CWL-runner arguments

Name Description Attributes
--parallel Run jobs in parallel. boolean, default: TRUE
--timestamps Add timestamps to the errors, warnings, and notifications. boolean_true

Undocumented arguments

Name Description Attributes
--abseq_umi integer
--target_analysis boolean
--vdj_jgene_evalue e-value threshold for J gene. The e-value threshold for J gene call by IgBlast/PyIR, default is set as 0.001 double
--vdj_vgene_evalue e-value threshold for V gene. The e-value threshold for V gene call by IgBlast/PyIR, default is set as 0.001 double
--write_filtered_reads boolean

Authors

  • Robrecht Cannoodt (author, maintainer)

  • Weiwei Schultz (contributor)