Bd rhapsody

BD Rhapsody Sequence Analysis CWL pipeline v2.2.1 This pipeline performs analysis of single-cell multiomic sequence read (FASTQ) data.

Info

ID: bd_rhapsody
Namespace: mapping

Links

Source

The supported sequencing libraries are those generated by the BD Rhapsody assay kits, including: Whole Transcriptome mRNA, Targeted mRNA, AbSeq Antibody-Oligonucleotides, Single-Cell Multiplexing, TCR/BCR, and ATAC-Seq

The CWL pipeline file is obtained by cloning ‘https://bitbucket.org/CRSwDev/cwl’ and removing all objects with class ‘DockerRequirement’ from the YAML.

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/mapping/bd_rhapsody/main.nf \
  --help

Run command

Example of params.yaml

# Inputs
# reads: ["WTALibrary_S1_L001_R1_001.fastq.gz", "WTALibrary_S1_L001_R2_001.fastq.gz"]
# reads_atac: ["ATACLibrary_S2_L001_R1_001.fastq.gz", "ATACLibrary_S2_L001_R2_001.fastq.gz", "ATACLibrary_S2_L001_I2_001.fastq.gz"]

# References
# reference_archive: "RhapRef_Human_WTA_2023-02.tar.gz"
# targeted_reference: ["BD_Rhapsody_Immune_Response_Panel_Hs.fasta"]
# abseq_reference: ["AbSeq_reference.fasta"]
# supplemental_reference: ["supplemental_reference.fasta"]

# Outputs
# output_dir: "$id.$key.output_dir"
# output_seurat: "$id.$key.output_seurat.rds"
# output_mudata: "$id.$key.output_mudata.h5mu"
# metrics_summary: "$id.$key.metrics_summary.csv"
# pipeline_report: "$id.$key.pipeline_report.html"
# rsec_mols_per_cell: "$id.$key.rsec_mols_per_cell.zip"
# dbec_mols_per_cell: "$id.$key.dbec_mols_per_cell.zip"
# rsec_mols_per_cell_unfiltered: "$id.$key.rsec_mols_per_cell_unfiltered.zip"
# bam: "$id.$key.bam.bam"
# bam_index: "$id.$key.bam_index.bai"
# bioproduct_stats: "$id.$key.bioproduct_stats.csv"
# dimred_tsne: "$id.$key.dimred_tsne.csv"
# dimred_umap: "$id.$key.dimred_umap.csv"
# immune_cell_classification: "$id.$key.immune_cell_classification.csv"

# Multiplex outputs
# sample_tag_metrics: "$id.$key.sample_tag_metrics.csv"
# sample_tag_calls: "$id.$key.sample_tag_calls.csv"
# sample_tag_counts: ["$id.$key.sample_tag_counts_*.zip"]
# sample_tag_counts_unassigned: "$id.$key.sample_tag_counts_unassigned.zip"

# VDJ Outputs
# vdj_metrics: "$id.$key.vdj_metrics.csv"
# vdj_per_cell: "$id.$key.vdj_per_cell.csv"
# vdj_per_cell_uncorrected: "$id.$key.vdj_per_cell_uncorrected.csv"
# vdj_dominant_contigs: "$id.$key.vdj_dominant_contigs.csv"
# vdj_unfiltered_contigs: "$id.$key.vdj_unfiltered_contigs.csv"

# ATAC-Seq outputs
# atac_metrics: "$id.$key.atac_metrics.csv"
# atac_metrics_json: "$id.$key.atac_metrics_json.json"
# atac_fragments: "$id.$key.atac_fragments.gz"
# atac_fragments_index: "$id.$key.atac_fragments_index.tbi"
# atac_transposase_sites: "$id.$key.atac_transposase_sites.gz"
# atac_transposase_sites_index: "$id.$key.atac_transposase_sites_index.tbi"
# atac_peaks: "$id.$key.atac_peaks.gz"
# atac_peaks_index: "$id.$key.atac_peaks_index.tbi"
# atac_peak_annotation: "$id.$key.atac_peak_annotation.gz"
# atac_cell_by_peak: "$id.$key.atac_cell_by_peak.zip"
# atac_cell_by_peak_unfiltered: "$id.$key.atac_cell_by_peak_unfiltered.zip"
# atac_bam: "$id.$key.atac_bam.bam"
# atac_bam_index: "$id.$key.atac_bam_index.bai"

# AbSeq Cell Calling outputs
# protein_aggregates_experimental: "$id.$key.protein_aggregates_experimental.csv"

# Putative Cell Calling Settings
# cell_calling_data: "mRNA"
# cell_calling_bioproduct_algorithm: "Basic"
# cell_calling_atac_algorithm: "Basic"
# exact_cell_count: 10000
# expected_cell_count: 20000

# Intronic Reads Settings
# exclude_intronic_reads: false

# Multiplex Settings
# sample_tags_version: "human"
# tag_names: ["4-mySample", "9-myOtherSample", "6-alsoThisSample"]

# VDJ arguments
# vdj_version: "human"

# ATAC options
# predefined_atac_peaks: "predefined_peaks.bed"

# Additional options
run_name: "sample"
generate_bam: false
# long_reads: true

# Advanced options
# custom_star_params: "--alignIntronMax 6000 --outFilterScoreMinOverLread 0.1 --limitOutSJcollapsed 2000000"
# custom_bwa_mem2_params: "-k 16 -w 200 -r"

# CWL-runner arguments
parallel: true
timestamps: false

# Undocumented arguments
# abseq_umi: 123
# target_analysis: true
# vdj_jgene_evalue: 123.0
# vdj_vgene_evalue: 123.0
# write_filtered_reads: true

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/mapping/bd_rhapsody/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name	Description	Attributes
`--reads`	Reads (optional) - Path to your FASTQ.GZ formatted read files from libraries that may include: - WTA mRNA - Targeted mRNA - AbSeq - Sample Multiplexing - VDJ You may specify as many R1/R2 read pairs as you want.	List of `file`, example: `"WTALibrary_S1_L001_R1_001.fastq.gz", "WTALibrary_S1_L001_R2_001.fastq.gz"`, multiple_sep: `";"`
`--reads_atac`	Path to your FASTQ.GZ formatted read files from ATAC-Seq libraries. You may specify as many R1/R2/I2 files as you want.	List of `file`, example: `"ATACLibrary_S2_L001_R1_001.fastq.gz", "ATACLibrary_S2_L001_R2_001.fastq.gz", "ATACLibrary_S2_L001_I2_001.fastq.gz"`, multiple_sep: `";"`

References

Assay type will be inferred from the provided reference(s). Do not provide both reference_archive and targeted_reference at the same time.

Valid reference input combinations: - reference_archive: WTA only - reference_archive & abseq_reference: WTA + AbSeq - reference_archive & supplemental_reference: WTA + extra transgenes - reference_archive & abseq_reference & supplemental_reference: WTA + AbSeq + extra transgenes - reference_archive: WTA + ATAC or ATAC only - reference_archive & supplemental_reference: WTA + ATAC + extra transgenes - targeted_reference: Targeted only - targeted_reference & abseq_reference: Targeted + AbSeq - abseq_reference: AbSeq only

The reference_archive can be generated with the bd_rhapsody_make_reference component. Alternatively, BD also provides standard references which can be downloaded from these locations:

Human: https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/RhapRef_Human_WTA_2023-02.tar.gz
Mouse: https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/RhapRef_Mouse_WTA_2023-02.tar.gz

Name	Description	Attributes
`--reference_archive`	Path to Rhapsody WTA Reference in the tar.gz format. Structure of the reference archive: - `BD_Rhapsody_Reference_Files/`: top level folder - `star_index/`: sub-folder containing STAR index, that is files created with `STAR --runMode genomeGenerate` - GTF for gene-transcript-annotation e.g. “gencode.v43.primary_assembly.annotation.gtf”	`file`, example: `"RhapRef_Human_WTA_2023-02.tar.gz"`
`--targeted_reference`	Path to the targeted reference file in FASTA format.	List of `file`, example: `"BD_Rhapsody_Immune_Response_Panel_Hs.fasta"`, multiple_sep: `";"`
`--abseq_reference`	Path to the AbSeq reference file in FASTA format. Only needed if BD AbSeq Ab-Oligos are used.	List of `file`, example: `"AbSeq_reference.fasta"`, multiple_sep: `";"`
`--supplemental_reference`	Path to the supplemental reference file in FASTA format. Only needed if there are additional transgene sequences to be aligned against in a WTA assay experiment.	List of `file`, example: `"supplemental_reference.fasta"`, multiple_sep: `";"`

Outputs

Outputs for all pipeline runs

Name	Description	Attributes
`--output_dir`	The unprocessed output directory containing all the outputs from the pipeline.	`file`, required, example: `"output_dir"`
`--output_seurat`	Single-cell analysis tool inputs. Seurat (.rds) input file containing RSEC molecules data table and all cell annotation metadata.	`file`, example: `"output_seurat.rds"`
`--output_mudata`		`file`, example: `"output_mudata.h5mu"`
`--metrics_summary`	Metrics Summary. Report containing sequencing, molecules, and cell metrics.	`file`, example: `"metrics_summary.csv"`
`--pipeline_report`	Pipeline Report. Summary report containing the results from the sequencing analysis pipeline run.	`file`, example: `"pipeline_report.html"`
`--rsec_mols_per_cell`	Molecules per bioproduct per cell bassed on RSEC	`file`, example: `"RSEC_MolsPerCell_MEX.zip"`
`--dbec_mols_per_cell`	Molecules per bioproduct per cell bassed on DBEC. DBEC data table is only output if the experiment includes targeted mRNA or AbSeq bioproducts.	`file`, example: `"DBEC_MolsPerCell_MEX.zip"`
`--rsec_mols_per_cell_unfiltered`	Unfiltered tables containing all cell labels with 10 reads.	`file`, example: `"RSEC_MolsPerCell_Unfiltered_MEX.zip"`
`--bam`	Alignment file of R2 with associated R1 annotations for Bioproduct.	`file`, example: `"BioProduct.bam"`
`--bam_index`	Index file for the alignment file.	`file`, example: `"BioProduct.bam.bai"`
`--bioproduct_stats`	Bioproduct Stats. Metrics from RSEC and DBEC Unique Molecular Identifier adjustment algorithms on a per-bioproduct basis.	`file`, example: `"Bioproduct_Stats.csv"`
`--dimred_tsne`	t-SNE dimensionality reduction coordinates per cell index	`file`, example: `"tSNE_coordinates.csv"`
`--dimred_umap`	UMAP dimensionality reduction coordinates per cell index	`file`, example: `"UMAP_coordinates.csv"`
`--immune_cell_classification`	Immune Cell Classification. Cell type classification based on the expression of immune cell markers.	`file`, example: `"Immune_Cell_Classification.csv"`

Multiplex outputs

Outputs when multiplex option is selected

Name	Description	Attributes
`--sample_tag_metrics`	Sample Tag Metrics. Metrics from the sample determination algorithm.	`file`, example: `"Sample_Tag_Metrics.csv"`
`--sample_tag_calls`	Sample Tag Calls. Assigned Sample Tag for each putative cell	`file`, example: `"Sample_Tag_Calls.csv"`
`--sample_tag_counts`	Sample Tag Counts. Separate data tables and metric summary for cells assigned to each sample tag. Note: For putative cells that could not be assigned a specific Sample Tag, a Multiplet_and_Undetermined.zip file is also output.	List of `file`, example: `"Sample_Tag1.zip"`, multiple_sep: `";"`
`--sample_tag_counts_unassigned`	Sample Tag Counts Unassigned. Data table and metric summary for cells that could not be assigned a specific Sample Tag.	`file`, example: `"Multiplet_and_Undetermined.zip"`

VDJ Outputs

Outputs when VDJ option selected

Name	Description	Attributes
`--vdj_metrics`	VDJ Metrics. Overall metrics from the VDJ analysis.	`file`, example: `"VDJ_Metrics.csv"`
`--vdj_per_cell`	VDJ Per Cell. Cell specific read and molecule counts, VDJ gene segments, CDR3 sequences, paired chains, and cell type.	`file`, example: `"VDJ_perCell.csv"`
`--vdj_per_cell_uncorrected`	VDJ Per Cell Uncorrected. Cell specific read and molecule counts, VDJ gene segments, CDR3 sequences, paired chains, and cell type.	`file`, example: `"VDJ_perCell_uncorrected.csv"`
`--vdj_dominant_contigs`	VDJ Dominant Contigs. Dominant contig for each cell label chain type combination (putative cells only).	`file`, example: `"VDJ_Dominant_Contigs_AIRR.csv"`
`--vdj_unfiltered_contigs`	VDJ Unfiltered Contigs. All contigs that were assembled and annotated successfully (all cells).	`file`, example: `"VDJ_Unfiltered_Contigs_AIRR.csv"`

ATAC-Seq outputs

Outputs when ATAC-Seq option selected

Name	Description	Attributes
`--atac_metrics`	ATAC Metrics. Overall metrics from the ATAC-Seq analysis.	`file`, example: `"ATAC_Metrics.csv"`
`--atac_metrics_json`	ATAC Metrics JSON. Overall metrics from the ATAC-Seq analysis in JSON format.	`file`, example: `"ATAC_Metrics.json"`
`--atac_fragments`	ATAC Fragments. Chromosomal location, cell index, and read support for each fragment detected	`file`, example: `"ATAC_Fragments.bed.gz"`
`--atac_fragments_index`	Index of ATAC Fragments.	`file`, example: `"ATAC_Fragments.bed.gz.tbi"`
`--atac_transposase_sites`	ATAC Transposase Sites. Chromosomal location, cell index, and read support for each transposase site detected	`file`, example: `"ATAC_Transposase_Sites.bed.gz"`
`--atac_transposase_sites_index`	Index of ATAC Transposase Sites.	`file`, example: `"ATAC_Transposase_Sites.bed.gz.tbi"`
`--atac_peaks`	ATAC Peaks. Peak regions of transposase activity	`file`, example: `"ATAC_Peaks.bed.gz"`
`--atac_peaks_index`	Index of ATAC Peaks.	`file`, example: `"ATAC_Peaks.bed.gz.tbi"`
`--atac_peak_annotation`	ATAC Peak Annotation. Estimated annotation of peak-to-gene connections	`file`, example: `"peak_annotation.tsv.gz"`
`--atac_cell_by_peak`	ATAC Cell by Peak. Peak regions of transposase activity per cell	`file`, example: `"ATAC_Cell_by_Peak_MEX.zip"`
`--atac_cell_by_peak_unfiltered`	ATAC Cell by Peak Unfiltered. Unfiltered file containing all cell labels with >=1 transposase sites in peaks.	`file`, example: `"ATAC_Cell_by_Peak_Unfiltered_MEX.zip"`
`--atac_bam`	ATAC BAM. Alignment file for R1 and R2 with associated I2 annotations for ATAC-Seq. Only output if the BAM generation flag is set to true.	`file`, example: `"ATAC.bam"`
`--atac_bam_index`	Index of ATAC BAM.	`file`, example: `"ATAC.bam.bai"`

AbSeq Cell Calling outputs

Outputs when Cell Calling Abseq is selected

Name	Description	Attributes
`--protein_aggregates_experimental`	Protein Aggregates Experimental	`file`, example: `"Protein_Aggregates_Experimental.csv"`

Putative Cell Calling Settings

Name	Description	Attributes
`--cell_calling_data`	Specify the dataset to be used for putative cell calling: mRNA, AbSeq, ATAC, mRNA_and_ATAC For putative cell calling using an AbSeq dataset, please provide an AbSeq_Reference fasta file above. For putative cell calling using an ATAC dataset, please provide a WTA+ATAC-Seq Reference_Archive file above. The default data for putative cell calling, will be determined the following way: - If mRNA Reads and ATAC Reads exist: mRNA_and_ATAC - If only ATAC Reads exist: ATAC - Otherwise: mRNA	`string`, example: `"mRNA"`
`--cell_calling_bioproduct_algorithm`	Specify the bioproduct algorithm to be used for putative cell calling: Basic or Refined By default, the Basic algorithm will be used for putative cell calling.	`string`, example: `"Basic"`
`--cell_calling_atac_algorithm`	Specify the ATAC-seq algorithm to be used for putative cell calling: Basic or Refined By default, the Basic algorithm will be used for putative cell calling.	`string`, example: `"Basic"`
`--exact_cell_count`	Set a specific number of cells as putative, based on those with the highest error-corrected read count	`integer`, example: `10000`
`--expected_cell_count`	Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded into the Rhapsody cartridge. If there are multiple inflection points on the second derivative cumulative curve, this will ensure the one selected is near the expected.	`integer`, example: `20000`

Intronic Reads Settings

Name	Description	Attributes
`--exclude_intronic_reads`	By default, the flag is false, and reads aligned to exons and introns are considered and represented in molecule counts. When the flag is set to true, intronic reads will be excluded. The value can be true or false.	`boolean`, example: `FALSE`

Multiplex Settings

Name	Description	Attributes
`--sample_tags_version`	Specify the version of the Sample Tags used in the run: * If Sample Tag Multiplexing was done, specify the appropriate version: human, mouse, flex, nuclei_includes_mrna, nuclei_atac_only * If this is an SMK + Nuclei mRNA run or an SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq) run (and not an SMK + ATAC-Seq only run), choose the “nuclei_includes_mrna” option. * If this is an SMK + ATAC-Seq only run (and not SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq)), choose the “nuclei_atac_only” option.	`string`, example: `"human"`
`--tag_names`	Specify the tag number followed by ‘-’ and the desired sample name to appear in Sample_Tag_Metrics.csv Do not use the special characters.	List of `string`, example: `"4-mySample", "9-myOtherSample", "6-alsoThisSample"`, multiple_sep: `";"`

VDJ arguments

Name	Description	Attributes
`--vdj_version`	If VDJ was done, specify the appropriate option: human, mouse, humanBCR, humanTCR, mouseBCR, mouseTCR	`string`, example: `"human"`

ATAC options

Name	Description	Attributes
`--predefined_atac_peaks`	An optional BED file containing pre-established chromatin accessibility peak regions for generating the ATAC cell-by-peak matrix.	`file`, example: `"predefined_peaks.bed"`

Additional options

Name	Description	Attributes
`--run_name`	Specify a run name to use as the output file base name. Use only letters, numbers, or hyphens. Do not use special characters or spaces.	`string`, default: `"sample"`
`--generate_bam`	Specify whether to create the BAM file output	`boolean`, default: `FALSE`
`--long_reads`	Use STARlong (default: undefined - i.e. autodetects based on read lengths) - Specify if the STARlong aligner should be used instead of STAR. Set to true if the reads are longer than 650bp.	`boolean`

Advanced options

NOTE: Only change these if you are really sure about what you are doing

Name	Description	Attributes
`--custom_star_params`	Modify STAR alignment parameters - Set this parameter to fully override default STAR mapping parameters used in the pipeline. For reference this is the default that is used: Short Reads: `--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMultimapScoreRange 0 --clip3pAdapterSeq AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA --seedSearchStartLmax 50 --outFilterMatchNmin 25 --limitOutSJcollapsed 2000000` Long Reads: Same as Short Reads + `--seedPerReadNmax 10000` This applies to fastqs provided in the Reads user input Do NOT set any non-mapping related params like `--genomeDir`, `--outSAMtype`, `--outSAMunmapped`, `--readFilesIn`, `--runThreadN`, etc. We use STAR version 2.7.10b	`string`, example: `"--alignIntronMax 6000 --outFilterScoreMinOverLread 0.1 --limitOutSJcollapsed 2000000"`
`--custom_bwa_mem2_params`	Modify bwa-mem2 alignment parameters - Set this parameter to fully override bwa-mem2 mapping parameters used in the pipeline The pipeline does not specify any custom mapping params to bwa-mem2 so program default values are used This applies to fastqs provided in the Reads_ATAC user input Do NOT set any non-mapping related params like `-C`, `-t`, etc. We use bwa-mem2 version 2.2.1	`string`, example: `"-k 16 -w 200 -r"`

CWL-runner arguments

Name	Description	Attributes
`--parallel`	Run jobs in parallel.	`boolean`, default: `TRUE`
`--timestamps`	Add timestamps to the errors, warnings, and notifications.	`boolean_true`

Undocumented arguments

Name	Description	Attributes
`--abseq_umi`		`integer`
`--target_analysis`		`boolean`
`--vdj_jgene_evalue`	e-value threshold for J gene. The e-value threshold for J gene call by IgBlast/PyIR, default is set as 0.001	`double`
`--vdj_vgene_evalue`	e-value threshold for V gene. The e-value threshold for V gene call by IgBlast/PyIR, default is set as 0.001	`double`
`--write_filtered_reads`		`boolean`

Authors

Robrecht Cannoodt (author, maintainer)
Weiwei Schultz (contributor)