BD Rhapsody

BD Rhapsody Sequence Analysis CWL pipeline v2.2.1

Info

ID: bd_rhapsody
Namespace: workflows/ingestion

Links

This pipeline performs analysis of single-cell multiomic sequence read (FASTQ) data. The supported sequencing libraries are those generated by the BD Rhapsody assay kits, including: Whole Transcriptome mRNA, Targeted mRNA, AbSeq Antibody-Oligonucleotides, Single-Cell Multiplexing, TCR/BCR, and ATAC-Seq

The CWL pipeline file is obtained by cloning ‘https://bitbucket.org/CRSwDev/cwl’ and removing all objects with class ‘DockerRequirement’ from the YAML.

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/workflows/ingestion/bd_rhapsody/main.nf \
  --help

Run command

Example of params.yaml

# Inputs
# reads: ["WTALibrary_S1_L001_R1_001.fastq.gz", "WTALibrary_S1_L001_R2_001.fastq.gz"]
# reads_atac: ["ATACLibrary_S2_L001_R1_001.fastq.gz", "ATACLibrary_S2_L001_R2_001.fastq.gz", "ATACLibrary_S2_L001_I2_001.fastq.gz"]

# References
# reference_archive: "RhapRef_Human_WTA_2023-02.tar.gz"
# targeted_reference: ["BD_Rhapsody_Immune_Response_Panel_Hs.fasta"]
# abseq_reference: ["AbSeq_reference.fasta"]
# supplemental_reference: ["supplemental_reference.fasta"]

# Outputs
# output: "$id.$key.output.h5mu"
# output_raw: "$id.$key.output_raw"

# Putative Cell Calling Settings
# cell_calling_data: "mRNA"
# cell_calling_bioproduct_algorithm: "Basic"
# cell_calling_atac_algorithm: "Basic"
# exact_cell_count: 10000
# expected_cell_count: 20000

# Intronic Reads Settings
# exclude_intronic_reads: false

# Multiplex Settings
# sample_tags_version: "human"
# tag_names: ["4-mySample", "9-myOtherSample", "6-alsoThisSample"]

# VDJ arguments
# vdj_version: "human"

# ATAC options
# predefined_atac_peaks: "predefined_peaks.bed"

# Additional options
run_name: "sample"
generate_bam: false
# long_reads: true

# Advanced options
# custom_star_params: "--alignIntronMax 6000 --outFilterScoreMinOverLread 0.1 --limitOutSJcollapsed 2000000"
# custom_bwa_mem2_params: "-k 16 -w 200 -r"

# CWL-runner arguments
parallel: true
timestamps: false

# Undocumented arguments
# abseq_umi: 123
# target_analysis: true
# vdj_jgene_evalue: 123.0
# vdj_vgene_evalue: 123.0
# write_filtered_reads: true

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/workflows/ingestion/bd_rhapsody/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name	Description	Attributes
`--reads`	Reads (optional) - Path to your FASTQ.GZ formatted read files from libraries that may include: - WTA mRNA - Targeted mRNA - AbSeq - Sample Multiplexing - VDJ You may specify as many R1/R2 read pairs as you want.	List of `file`, example: `"WTALibrary_S1_L001_R1_001.fastq.gz", "WTALibrary_S1_L001_R2_001.fastq.gz"`, multiple_sep: `";"`
`--reads_atac`	Path to your FASTQ.GZ formatted read files from ATAC-Seq libraries. You may specify as many R1/R2/I2 files as you want.	List of `file`, example: `"ATACLibrary_S2_L001_R1_001.fastq.gz", "ATACLibrary_S2_L001_R2_001.fastq.gz", "ATACLibrary_S2_L001_I2_001.fastq.gz"`, multiple_sep: `";"`

References

Assay type will be inferred from the provided reference(s). Do not provide both reference_archive and targeted_reference at the same time.

Valid reference input combinations: - reference_archive: WTA only - reference_archive & abseq_reference: WTA + AbSeq - reference_archive & supplemental_reference: WTA + extra transgenes - reference_archive & abseq_reference & supplemental_reference: WTA + AbSeq + extra transgenes - reference_archive: WTA + ATAC or ATAC only - reference_archive & supplemental_reference: WTA + ATAC + extra transgenes - targeted_reference: Targeted only - targeted_reference & abseq_reference: Targeted + AbSeq - abseq_reference: AbSeq only

The reference_archive can be generated with the reference/build_bdrhap_reference component. Alternatively, BD also provides standard references which can be downloaded from these locations:

Human: https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/RhapRef_Human_WTA_2023-02.tar.gz
Mouse: https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/RhapRef_Mouse_WTA_2023-02.tar.gz

Name	Description	Attributes
`--reference_archive`	Path to Rhapsody WTA Reference in the tar.gz format. Structure of the reference archive: - `BD_Rhapsody_Reference_Files/`: top level folder - `star_index/`: sub-folder containing STAR index, that is files created with `STAR --runMode genomeGenerate` - GTF for gene-transcript-annotation e.g. “gencode.v43.primary_assembly.annotation.gtf”	`file`, example: `"RhapRef_Human_WTA_2023-02.tar.gz"`
`--targeted_reference`	Path to the targeted reference file in FASTA format.	List of `file`, example: `"BD_Rhapsody_Immune_Response_Panel_Hs.fasta"`, multiple_sep: `";"`
`--abseq_reference`	Path to the AbSeq reference file in FASTA format. Only needed if BD AbSeq Ab-Oligos are used.	List of `file`, example: `"AbSeq_reference.fasta"`, multiple_sep: `";"`
`--supplemental_reference`	Path to the supplemental reference file in FASTA format. Only needed if there are additional transgene sequences to be aligned against in a WTA assay experiment.	List of `file`, example: `"supplemental_reference.fasta"`, multiple_sep: `";"`

Outputs

Name	Description	Attributes
`--output`	The processed output file in h5mu format.	`file`, required, example: `"output.h5mu"`
`--output_raw`	The unprocessed output directory containing all the outputs from the pipeline.	`file`, required, example: `"output_dir"`

Putative Cell Calling Settings

Name	Description	Attributes
`--cell_calling_data`	Specify the dataset to be used for putative cell calling: mRNA, AbSeq, ATAC, mRNA_and_ATAC For putative cell calling using an AbSeq dataset, please provide an AbSeq_Reference fasta file above. For putative cell calling using an ATAC dataset, please provide a WTA+ATAC-Seq Reference_Archive file above. The default data for putative cell calling, will be determined the following way: - If mRNA Reads and ATAC Reads exist: mRNA_and_ATAC - If only ATAC Reads exist: ATAC - Otherwise: mRNA	`string`, example: `"mRNA"`
`--cell_calling_bioproduct_algorithm`	Specify the bioproduct algorithm to be used for putative cell calling: Basic or Refined By default, the Basic algorithm will be used for putative cell calling.	`string`, example: `"Basic"`
`--cell_calling_atac_algorithm`	Specify the ATAC-seq algorithm to be used for putative cell calling: Basic or Refined By default, the Basic algorithm will be used for putative cell calling.	`string`, example: `"Basic"`
`--exact_cell_count`	Set a specific number of cells as putative, based on those with the highest error-corrected read count	`integer`, example: `10000`
`--expected_cell_count`	Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded into the Rhapsody cartridge. If there are multiple inflection points on the second derivative cumulative curve, this will ensure the one selected is near the expected.	`integer`, example: `20000`

Intronic Reads Settings

Name	Description	Attributes
`--exclude_intronic_reads`	By default, the flag is false, and reads aligned to exons and introns are considered and represented in molecule counts. When the flag is set to true, intronic reads will be excluded. The value can be true or false.	`boolean`, example: `FALSE`

Multiplex Settings

Name	Description	Attributes
`--sample_tags_version`	Specify the version of the Sample Tags used in the run: * If Sample Tag Multiplexing was done, specify the appropriate version: human, mouse, flex, nuclei_includes_mrna, nuclei_atac_only * If this is an SMK + Nuclei mRNA run or an SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq) run (and not an SMK + ATAC-Seq only run), choose the “nuclei_includes_mrna” option. * If this is an SMK + ATAC-Seq only run (and not SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq)), choose the “nuclei_atac_only” option.	`string`, example: `"human"`
`--tag_names`	Specify the tag number followed by ‘-’ and the desired sample name to appear in Sample_Tag_Metrics.csv Do not use the special characters.	List of `string`, example: `"4-mySample", "9-myOtherSample", "6-alsoThisSample"`, multiple_sep: `";"`

VDJ arguments

Name	Description	Attributes
`--vdj_version`	If VDJ was done, specify the appropriate option: human, mouse, humanBCR, humanTCR, mouseBCR, mouseTCR	`string`, example: `"human"`

ATAC options

Name	Description	Attributes
`--predefined_atac_peaks`	An optional BED file containing pre-established chromatin accessibility peak regions for generating the ATAC cell-by-peak matrix.	`file`, example: `"predefined_peaks.bed"`

Additional options

Name	Description	Attributes
`--run_name`	Specify a run name to use as the output file base name. Use only letters, numbers, or hyphens. Do not use special characters or spaces.	`string`, default: `"sample"`
`--generate_bam`	Specify whether to create the BAM file output	`boolean`, default: `FALSE`
`--long_reads`	Use STARlong (default: undefined - i.e. autodetects based on read lengths) - Specify if the STARlong aligner should be used instead of STAR. Set to true if the reads are longer than 650bp.	`boolean`

Advanced options

NOTE: Only change these if you are really sure about what you are doing

Name	Description	Attributes
`--custom_star_params`	Modify STAR alignment parameters - Set this parameter to fully override default STAR mapping parameters used in the pipeline. For reference this is the default that is used: Short Reads: `--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMultimapScoreRange 0 --clip3pAdapterSeq AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA --seedSearchStartLmax 50 --outFilterMatchNmin 25 --limitOutSJcollapsed 2000000` Long Reads: Same as Short Reads + `--seedPerReadNmax 10000` This applies to fastqs provided in the Reads user input Do NOT set any non-mapping related params like `--genomeDir`, `--outSAMtype`, `--outSAMunmapped`, `--readFilesIn`, `--runThreadN`, etc. We use STAR version 2.7.10b	`string`, example: `"--alignIntronMax 6000 --outFilterScoreMinOverLread 0.1 --limitOutSJcollapsed 2000000"`
`--custom_bwa_mem2_params`	Modify bwa-mem2 alignment parameters - Set this parameter to fully override bwa-mem2 mapping parameters used in the pipeline The pipeline does not specify any custom mapping params to bwa-mem2 so program default values are used This applies to fastqs provided in the Reads_ATAC user input Do NOT set any non-mapping related params like `-C`, `-t`, etc. We use bwa-mem2 version 2.2.1	`string`, example: `"-k 16 -w 200 -r"`

CWL-runner arguments

Name	Description	Attributes
`--parallel`	Run jobs in parallel.	`boolean`, default: `TRUE`
`--timestamps`	Add timestamps to the errors, warnings, and notifications.	`boolean_true`

Undocumented arguments

Name	Description	Attributes
`--abseq_umi`		`integer`
`--target_analysis`		`boolean`
`--vdj_jgene_evalue`	e-value threshold for J gene. The e-value threshold for J gene call by IgBlast/PyIR, default is set as 0.001	`double`
`--vdj_vgene_evalue`	e-value threshold for V gene. The e-value threshold for V gene call by IgBlast/PyIR, default is set as 0.001	`double`
`--write_filtered_reads`		`boolean`

Authors

Robrecht Cannoodt (maintainer)
Dorien Roosen (author)

Visualisation

flowchart TB
    v0(Channel.fromList)
    v2(filter)
    v9(filter)
    v17(bd_rhapsody_component)
    v24(cross)
    v34(cross)
    v40(filter)
    v70(concat)
    v48(from_bdrhap_to_h5mu)
    v55(cross)
    v65(cross)
    v77(cross)
    v84(cross)
    v96(cross)
    v103(cross)
    v107(Output)
    v0-->v2
    v2-->v9
    v9-->v17
    v17-->v24
    v9-->v24
    v9-->v34
    v40-->v48
    v48-->v55
    v40-->v55
    v40-->v65
    v65-->v70
    v70-->v77
    v2-->v77
    v77-->v84
    v2-->v84
    v2-->v96
    v96-->v103
    v2-->v103
    v103-->v107
    v34-->v40
    v17-->v34
    v48-->v65
    v70-->v96
    style v0 fill:#e3dcea,stroke:#7a4baa;
    style v2 fill:#e3dcea,stroke:#7a4baa;
    style v9 fill:#e3dcea,stroke:#7a4baa;
    style v17 fill:#e3dcea,stroke:#7a4baa;
    style v24 fill:#e3dcea,stroke:#7a4baa;
    style v34 fill:#e3dcea,stroke:#7a4baa;
    style v40 fill:#e3dcea,stroke:#7a4baa;
    style v70 fill:#e3dcea,stroke:#7a4baa;
    style v48 fill:#e3dcea,stroke:#7a4baa;
    style v55 fill:#e3dcea,stroke:#7a4baa;
    style v65 fill:#e3dcea,stroke:#7a4baa;
    style v77 fill:#e3dcea,stroke:#7a4baa;
    style v84 fill:#e3dcea,stroke:#7a4baa;
    style v96 fill:#e3dcea,stroke:#7a4baa;
    style v103 fill:#e3dcea,stroke:#7a4baa;
    style v107 fill:#e3dcea,stroke:#7a4baa;