BD Rhapsody

BD Rhapsody Sequence Analysis CWL pipeline v2.2.1

Info

ID: bd_rhapsody
Namespace: workflows/ingestion

This pipeline performs analysis of single-cell multiomic sequence read (FASTQ) data. The supported sequencing libraries are those generated by the BD Rhapsody assay kits, including: Whole Transcriptome mRNA, Targeted mRNA, AbSeq Antibody-Oligonucleotides, Single-Cell Multiplexing, TCR/BCR, and ATAC-Seq

The CWL pipeline file is obtained by cloning ‘https://bitbucket.org/CRSwDev/cwl’ and removing all objects with class ‘DockerRequirement’ from the YAML.

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -main-script target/nextflow/workflows/ingestion/bd_rhapsody/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
# reads: ["WTALibrary_S1_L001_R1_001.fastq.gz", "WTALibrary_S1_L001_R2_001.fastq.gz"]
# reads_atac: ["ATACLibrary_S2_L001_R1_001.fastq.gz", "ATACLibrary_S2_L001_R2_001.fastq.gz", "ATACLibrary_S2_L001_I2_001.fastq.gz"]

# References
# reference_archive: "RhapRef_Human_WTA_2023-02.tar.gz"
# targeted_reference: ["BD_Rhapsody_Immune_Response_Panel_Hs.fasta"]
# abseq_reference: ["AbSeq_reference.fasta"]
# supplemental_reference: ["supplemental_reference.fasta"]

# Outputs
# output: "$id.$key.output.h5mu"
# output_raw: "$id.$key.output_raw"

# Putative Cell Calling Settings
# cell_calling_data: "mRNA"
# cell_calling_bioproduct_algorithm: "Basic"
# cell_calling_atac_algorithm: "Basic"
# exact_cell_count: 10000
# expected_cell_count: 20000

# Intronic Reads Settings
# exclude_intronic_reads: false

# Multiplex Settings
# sample_tags_version: "human"
# tag_names: ["4-mySample", "9-myOtherSample", "6-alsoThisSample"]

# VDJ arguments
# vdj_version: "human"

# ATAC options
# predefined_atac_peaks: "predefined_peaks.bed"

# Additional options
run_name: "sample"
generate_bam: false
# long_reads: true

# Advanced options
# custom_star_params: "--alignIntronMax 6000 --outFilterScoreMinOverLread 0.1 --limitOutSJcollapsed 2000000"
# custom_bwa_mem2_params: "-k 16 -w 200 -r"

# CWL-runner arguments
parallel: true
timestamps: false

# Undocumented arguments
# abseq_umi: 123
# target_analysis: true
# vdj_jgene_evalue: 123.0
# vdj_vgene_evalue: 123.0
# write_filtered_reads: true

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments
nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -profile docker \
  -main-script target/nextflow/workflows/ingestion/bd_rhapsody/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--reads Reads (optional) - Path to your FASTQ.GZ formatted read files from libraries that may include: - WTA mRNA - Targeted mRNA - AbSeq - Sample Multiplexing - VDJ You may specify as many R1/R2 read pairs as you want. List of file, example: "WTALibrary_S1_L001_R1_001.fastq.gz", "WTALibrary_S1_L001_R2_001.fastq.gz", multiple_sep: ";"
--reads_atac Path to your FASTQ.GZ formatted read files from ATAC-Seq libraries. You may specify as many R1/R2/I2 files as you want. List of file, example: "ATACLibrary_S2_L001_R1_001.fastq.gz", "ATACLibrary_S2_L001_R2_001.fastq.gz", "ATACLibrary_S2_L001_I2_001.fastq.gz", multiple_sep: ";"

References

Assay type will be inferred from the provided reference(s). Do not provide both reference_archive and targeted_reference at the same time.

Valid reference input combinations: - reference_archive: WTA only - reference_archive & abseq_reference: WTA + AbSeq - reference_archive & supplemental_reference: WTA + extra transgenes - reference_archive & abseq_reference & supplemental_reference: WTA + AbSeq + extra transgenes - reference_archive: WTA + ATAC or ATAC only - reference_archive & supplemental_reference: WTA + ATAC + extra transgenes - targeted_reference: Targeted only - targeted_reference & abseq_reference: Targeted + AbSeq - abseq_reference: AbSeq only

The reference_archive can be generated with the reference/build_bdrhap_reference component. Alternatively, BD also provides standard references which can be downloaded from these locations:

  • Human: https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/RhapRef_Human_WTA_2023-02.tar.gz
  • Mouse: https://bd-rhapsody-public.s3.amazonaws.com/Rhapsody-WTA/Pipeline-version2.x_WTA_references/RhapRef_Mouse_WTA_2023-02.tar.gz
Name Description Attributes
--reference_archive Path to Rhapsody WTA Reference in the tar.gz format. Structure of the reference archive: - BD_Rhapsody_Reference_Files/: top level folder - star_index/: sub-folder containing STAR index, that is files created with STAR --runMode genomeGenerate - GTF for gene-transcript-annotation e.g. “gencode.v43.primary_assembly.annotation.gtf” file, example: "RhapRef_Human_WTA_2023-02.tar.gz"
--targeted_reference Path to the targeted reference file in FASTA format. List of file, example: "BD_Rhapsody_Immune_Response_Panel_Hs.fasta", multiple_sep: ";"
--abseq_reference Path to the AbSeq reference file in FASTA format. Only needed if BD AbSeq Ab-Oligos are used. List of file, example: "AbSeq_reference.fasta", multiple_sep: ";"
--supplemental_reference Path to the supplemental reference file in FASTA format. Only needed if there are additional transgene sequences to be aligned against in a WTA assay experiment. List of file, example: "supplemental_reference.fasta", multiple_sep: ";"

Outputs

Outputs

Name Description Attributes
--output The processed output file in h5mu format. file, required, example: "output.h5mu"
--output_raw The unprocessed output directory containing all the outputs from the pipeline. file, required, example: "output_dir"

Putative Cell Calling Settings

Name Description Attributes
--cell_calling_data Specify the dataset to be used for putative cell calling: mRNA, AbSeq, ATAC, mRNA_and_ATAC For putative cell calling using an AbSeq dataset, please provide an AbSeq_Reference fasta file above. For putative cell calling using an ATAC dataset, please provide a WTA+ATAC-Seq Reference_Archive file above. The default data for putative cell calling, will be determined the following way: - If mRNA Reads and ATAC Reads exist: mRNA_and_ATAC - If only ATAC Reads exist: ATAC - Otherwise: mRNA string, example: "mRNA"
--cell_calling_bioproduct_algorithm Specify the bioproduct algorithm to be used for putative cell calling: Basic or Refined By default, the Basic algorithm will be used for putative cell calling. string, example: "Basic"
--cell_calling_atac_algorithm Specify the ATAC-seq algorithm to be used for putative cell calling: Basic or Refined By default, the Basic algorithm will be used for putative cell calling. string, example: "Basic"
--exact_cell_count Set a specific number of cells as putative, based on those with the highest error-corrected read count integer, example: 10000
--expected_cell_count Guide the basic putative cell calling algorithm by providing an estimate of the number of cells expected. Usually this can be the number of cells loaded into the Rhapsody cartridge. If there are multiple inflection points on the second derivative cumulative curve, this will ensure the one selected is near the expected. integer, example: 20000

Intronic Reads Settings

Name Description Attributes
--exclude_intronic_reads By default, the flag is false, and reads aligned to exons and introns are considered and represented in molecule counts. When the flag is set to true, intronic reads will be excluded. The value can be true or false. boolean, example: FALSE

Multiplex Settings

Name Description Attributes
--sample_tags_version Specify the version of the Sample Tags used in the run: * If Sample Tag Multiplexing was done, specify the appropriate version: human, mouse, flex, nuclei_includes_mrna, nuclei_atac_only * If this is an SMK + Nuclei mRNA run or an SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq) run (and not an SMK + ATAC-Seq only run), choose the “nuclei_includes_mrna” option. * If this is an SMK + ATAC-Seq only run (and not SMK + Multiomic ATAC-Seq (WTA+ATAC-Seq)), choose the “nuclei_atac_only” option. string, example: "human"
--tag_names Specify the tag number followed by ‘-’ and the desired sample name to appear in Sample_Tag_Metrics.csv Do not use the special characters. List of string, example: "4-mySample", "9-myOtherSample", "6-alsoThisSample", multiple_sep: ";"

VDJ arguments

Name Description Attributes
--vdj_version If VDJ was done, specify the appropriate option: human, mouse, humanBCR, humanTCR, mouseBCR, mouseTCR string, example: "human"

ATAC options

Name Description Attributes
--predefined_atac_peaks An optional BED file containing pre-established chromatin accessibility peak regions for generating the ATAC cell-by-peak matrix. file, example: "predefined_peaks.bed"

Additional options

Name Description Attributes
--run_name Specify a run name to use as the output file base name. Use only letters, numbers, or hyphens. Do not use special characters or spaces. string, default: "sample"
--generate_bam Specify whether to create the BAM file output boolean, default: FALSE
--long_reads Use STARlong (default: undefined - i.e. autodetects based on read lengths) - Specify if the STARlong aligner should be used instead of STAR. Set to true if the reads are longer than 650bp. boolean

Advanced options

NOTE: Only change these if you are really sure about what you are doing

Name Description Attributes
--custom_star_params Modify STAR alignment parameters - Set this parameter to fully override default STAR mapping parameters used in the pipeline. For reference this is the default that is used: Short Reads: --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMultimapScoreRange 0 --clip3pAdapterSeq AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA --seedSearchStartLmax 50 --outFilterMatchNmin 25 --limitOutSJcollapsed 2000000 Long Reads: Same as Short Reads + --seedPerReadNmax 10000 This applies to fastqs provided in the Reads user input Do NOT set any non-mapping related params like --genomeDir, --outSAMtype, --outSAMunmapped, --readFilesIn, --runThreadN, etc. We use STAR version 2.7.10b string, example: "--alignIntronMax 6000 --outFilterScoreMinOverLread 0.1 --limitOutSJcollapsed 2000000"
--custom_bwa_mem2_params Modify bwa-mem2 alignment parameters - Set this parameter to fully override bwa-mem2 mapping parameters used in the pipeline The pipeline does not specify any custom mapping params to bwa-mem2 so program default values are used This applies to fastqs provided in the Reads_ATAC user input Do NOT set any non-mapping related params like -C, -t, etc. We use bwa-mem2 version 2.2.1 string, example: "-k 16 -w 200 -r"

CWL-runner arguments

Name Description Attributes
--parallel Run jobs in parallel. boolean, default: TRUE
--timestamps Add timestamps to the errors, warnings, and notifications. boolean_true

Undocumented arguments

Name Description Attributes
--abseq_umi integer
--target_analysis boolean
--vdj_jgene_evalue e-value threshold for J gene. The e-value threshold for J gene call by IgBlast/PyIR, default is set as 0.001 double
--vdj_vgene_evalue e-value threshold for V gene. The e-value threshold for V gene call by IgBlast/PyIR, default is set as 0.001 double
--write_filtered_reads boolean

Authors

  • Robrecht Cannoodt (maintainer)

  • Dorien Roosen (author)

Visualisation

flowchart TB
    v0(Channel.fromList)
    v2(filter)
    v9(filter)
    v17(bd_rhapsody_component)
    v24(cross)
    v34(cross)
    v40(filter)
    v70(concat)
    v48(from_bdrhap_to_h5mu)
    v55(cross)
    v65(cross)
    v77(cross)
    v84(cross)
    v96(cross)
    v103(cross)
    v107(Output)
    v0-->v2
    v2-->v9
    v9-->v17
    v17-->v24
    v9-->v24
    v9-->v34
    v40-->v48
    v48-->v55
    v40-->v55
    v40-->v65
    v65-->v70
    v70-->v77
    v2-->v77
    v77-->v84
    v2-->v84
    v2-->v96
    v96-->v103
    v2-->v103
    v103-->v107
    v34-->v40
    v17-->v34
    v48-->v65
    v70-->v96
    style v0 fill:#e3dcea,stroke:#7a4baa;
    style v2 fill:#e3dcea,stroke:#7a4baa;
    style v9 fill:#e3dcea,stroke:#7a4baa;
    style v17 fill:#e3dcea,stroke:#7a4baa;
    style v24 fill:#e3dcea,stroke:#7a4baa;
    style v34 fill:#e3dcea,stroke:#7a4baa;
    style v40 fill:#e3dcea,stroke:#7a4baa;
    style v70 fill:#e3dcea,stroke:#7a4baa;
    style v48 fill:#e3dcea,stroke:#7a4baa;
    style v55 fill:#e3dcea,stroke:#7a4baa;
    style v65 fill:#e3dcea,stroke:#7a4baa;
    style v77 fill:#e3dcea,stroke:#7a4baa;
    style v84 fill:#e3dcea,stroke:#7a4baa;
    style v96 fill:#e3dcea,stroke:#7a4baa;
    style v103 fill:#e3dcea,stroke:#7a4baa;
    style v107 fill:#e3dcea,stroke:#7a4baa;