Full pipeline

A pipeline to analyse multiple multiomics samples.

Info

ID: full_pipeline
Namespace: multiomics

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.12.6 -latest \
  -main-script ./workflows/multiomics/full_pipeline/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "input.h5mu"

# Outputs
# output: "$id.$key.output.h5mu"

# Sample ID options
add_id_to_obs: true
add_id_obs_output: "sample_id"
add_id_make_observation_keys_unique: true

# RNA filtering options
# rna_min_counts: 200
# rna_max_counts: 5000000
# rna_min_genes_per_cell: 200
# rna_max_genes_per_cell: 1500000
# rna_min_cells_per_gene: 3
# rna_min_fraction_mito: 0
# rna_max_fraction_mito: 0.2

# CITE-seq filtering options
# prot_min_counts: 3
# prot_max_counts: 5000000
# prot_min_proteins_per_cell: 200
# prot_max_proteins_per_cell: 100000000
# prot_min_cells_per_protein: 3

# Highly variable gene detection
filter_with_hvg_var_output: "filter_with_hvg"
filter_with_hvg_obs_batch_key: "sample_id"

# Mitochondrial Gene Detection
# var_name_mitochondrial_genes: "foo"
# obs_name_mitochondrial_fraction: "foo"
# var_gene_names: "gene_symbol"
mitochondrial_gene_regex: "^[mM][tT]-"

# QC metrics calculation options
# var_qc_metrics: ["ercc", "highly_variable"]
top_n_vars: [50, 100, 200, 500]

# PCA options
pca_overwrite: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 0.12.6 -latest \
  -profile docker \
  -main-script ./workflows/multiomics/full_pipeline/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--id ID of the sample. string, required, example: "foo"
--input Path to the sample. file, required, example: "input.h5mu"

Outputs

Name Description Attributes
--output Destination path to the output. file, required, example: "output.h5mu"

Sample ID options

Options for adding the id to .obs on the MuData object. Having a sample id present in a requirement of several components for this pipeline.

Name Description Attributes
--add_id_to_obs Add the value passed with –id to .obs. boolean, default: TRUE
--add_id_obs_output .Obs column to add the sample IDs to. Required and only used when –add_id_to_obs is set to ‘true’ string, default: "sample_id"
--add_id_make_observation_keys_unique Join the id to the .obs index (.obs_names). Only used when –add_id_to_obs is set to ‘true’. boolean, default: TRUE

RNA filtering options

Name Description Attributes
--rna_min_counts Minimum number of counts captured per cell. integer, example: 200
--rna_max_counts Maximum number of counts captured per cell. integer, example: 5000000
--rna_min_genes_per_cell Minimum of non-zero values per cell. integer, example: 200
--rna_max_genes_per_cell Maximum of non-zero values per cell. integer, example: 1500000
--rna_min_cells_per_gene Minimum of non-zero values per gene. integer, example: 3
--rna_min_fraction_mito Minimum fraction of UMIs that are mitochondrial. double, example: 0
--rna_max_fraction_mito Maximum fraction of UMIs that are mitochondrial. double, example: 0.2

CITE-seq filtering options

Name Description Attributes
--prot_min_counts Minimum number of counts per cell. integer, example: 3
--prot_max_counts Minimum number of counts per cell. integer, example: 5000000
--prot_min_proteins_per_cell Minimum of non-zero values per cell. integer, example: 200
--prot_max_proteins_per_cell Maximum of non-zero values per cell. integer, example: 100000000
--prot_min_cells_per_protein Minimum of non-zero values per protein. integer, example: 3

Highly variable gene detection

Name Description Attributes
--filter_with_hvg_var_output In which .var slot to store a boolean array corresponding to the highly variable genes. string, default: "filter_with_hvg"
--filter_with_hvg_obs_batch_key If specified, highly-variable genes are selected within each batch separately and merged. This simple process avoids the selection of batch-specific genes and acts as a lightweight batch correction method. string, default: "sample_id"

Mitochondrial Gene Detection

Name Description Attributes
--var_name_mitochondrial_genes In which .var slot to store a boolean array corresponding the mitochondrial genes. string
--obs_name_mitochondrial_fraction When specified, write the fraction of counts originating from mitochondrial genes (based on –mitochondrial_gene_regex) to an .obs column with the specified name. Requires –var_name_mitochondrial_genes. string
--var_gene_names .var column name to be used to detect mitochondrial genes instead of .var_names (default if not set). Gene names matching with the regex value from –mitochondrial_gene_regex will be identified as a mitochondrial gene. string, example: "gene_symbol"
--mitochondrial_gene_regex Regex string that identifies mitochondrial genes from –var_gene_names. By default will detect human and mouse mitochondrial genes from a gene symbol. string, default: "^[mM][tT]-"

QC metrics calculation options

Name Description Attributes
--var_qc_metrics Keys to select a boolean (containing only True or False) column from .var. For each cell, calculate the proportion of total values for genes which are labeled ‘True’, compared to the total sum of the values for all genes. Defaults to the combined values specified for –var_name_mitochondrial_genes and –filter_with_hvg_var_output. List of string, example: "ercc,highly_variable", multiple_sep: ","
--top_n_vars Number of top vars to be used to calculate cumulative proportions. If not specified, proportions are not calculated. --top_n_vars 20,50 finds cumulative proportion to the 20th and 50th most expressed vars. List of integer, default: 50, 100, 200, 500, multiple_sep: ","

PCA options

Name Description Attributes
--pca_overwrite Allow overwriting slots for PCA output. boolean_true

Authors

  • Dries Schaumont (author, maintainer)

Visualisation

flowchart LR
    v0(Input)
    v2(toSortedList)
    v4(flatMap)
    v7(toSortedList)
    v9(Output)
    v11(filter)
    v17(add_id)
    v19(join)
    v24(mix)
    v22(filter)
    v27(filter)
    v32(split_modalities)
    v34(join)
    v41(concat)
    v37(filter)
    v39(test_wf:run_wf:split_modalities_workflow:splitStub)
    v42(flatMap)
    v44(filter)
    v47(toSortedList)
    v49(flatMap)
    v55(filter)
    v61(grep_annotation_column)
    v63(join)
    v67(mix)
    v66(filter)
    v73(calculate_qc_metrics)
    v75(join)
    v83(publish)
    v85(join)
    v89(filter)
    v96(delimit_fraction)
    v98(join)
    v102(mix)
    v101(filter)
    v108(filter_with_counts)
    v110(join)
    v118(do_filter)
    v120(join)
    v128(filter_with_scrublet)
    v130(join)
    v199(concat)
    v133(filter)
    v136(toSortedList)
    v138(flatMap)
    v143(filter)
    v149(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:unfiltered_counts_qc_metrics_prot:grep_annotation_column:grep_annotation_column_process1)
    v151(join)
    v155(mix)
    v154(filter)
    v161(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:unfiltered_counts_qc_metrics_prot:calculate_qc_metrics:calculate_qc_metrics_process1)
    v163(join)
    v171(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:unfiltered_counts_qc_metrics_prot:publish:publish_process1)
    v173(join)
    v182(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:filter_with_counts:filter_with_counts_process1)
    v184(join)
    v192(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:do_filter:do_filter_process1)
    v194(join)
    v197(filter)
    v203(groupTuple)
    v209(concat)
    v211(join)
    v215(filter)
    v218(toSortedList)
    v220(flatMap)
    v222(toSortedList)
    v224(Output)
    v230(normalize_total)
    v232(join)
    v241(log1p)
    v243(join)
    v252(delete_layer)
    v254(join)
    v263(filter_with_hvg)
    v265(join)
    v274(rna_calculate_qc_metrics)
    v276(join)
    v340(concat)
    v281(filter)
    v284(toSortedList)
    v286(flatMap)
    v287(toSortedList)
    v289(Output)
    v295(clr)
    v297(join)
    v303(filter)
    v309(test_wf:run_wf:multisample_processing_workflow:prot_multisample:prot_qc:grep_annotation_column:grep_annotation_column_process2)
    v311(join)
    v315(mix)
    v314(filter)
    v321(test_wf:run_wf:multisample_processing_workflow:prot_multisample:prot_qc:calculate_qc_metrics:calculate_qc_metrics_process2)
    v323(join)
    v331(test_wf:run_wf:multisample_processing_workflow:prot_multisample:prot_qc:publish:publish_process2)
    v333(join)
    v337(filter)
    v342(toSortedList)
    v348(merge)
    v350(join)
    v354(filter)
    v358(toSortedList)
    v360(flatMap)
    v367(pca)
    v369(join)
    v378(find_neighbors)
    v380(join)
    v389(umap)
    v391(join)
    v397(concat)
    v396(filter)
    v398(filter)
    v402(toSortedList)
    v404(flatMap)
    v411(pca)
    v413(join)
    v422(find_neighbors)
    v424(join)
    v433(test_wf:run_wf:integration_setup_workflow:initialize_integration_prot:umap:umap_process1)
    v435(join)
    v441(concat)
    v440(filter)
    v449(test_wf:run_wf:publish:publish_process3)
    v451(join)
    v456(toSortedList)
    v458(Output)
    v41-->v42
    v66-->v67
    v101-->v102
    v154-->v155
    v286-->v287
    v314-->v315
    v396-->v397
    v397-->v398
    v397-->v440
    v440-->v441
    v0-->v2
    v2-->v4
    v4-->v7
    v7-->v9
    v4-->v11
    v4-->v22
    v11-->v19
    v11-->v17
    v17-->v19
    v19-->v24
    v22-->v24
    v24-->v27
    v24-->v37
    v27-->v34
    v27-->v32
    v32-->v34
    v34-->v41
    v37-->v39
    v39-->v41
    v42-->v44
    v42-->v133
    v42-->v197
    v44-->v47
    v47-->v49
    v49-->v55
    v49-->v66
    v55-->v63
    v55-->v61
    v61-->v63
    v63-->v67
    v67-->v75
    v67-->v73
    v73-->v75
    v75-->v85
    v75-->v83
    v83-->v85
    v85-->v89
    v85-->v101
    v89-->v98
    v89-->v96
    v96-->v98
    v98-->v102
    v102-->v110
    v102-->v108
    v108-->v110
    v110-->v120
    v110-->v118
    v118-->v120
    v120-->v130
    v120-->v128
    v128-->v130
    v130-->v199
    v133-->v136
    v136-->v138
    v138-->v143
    v138-->v154
    v143-->v151
    v143-->v149
    v149-->v151
    v151-->v155
    v155-->v163
    v155-->v161
    v161-->v163
    v163-->v173
    v163-->v171
    v171-->v173
    v173-->v184
    v173-->v182
    v182-->v184
    v184-->v194
    v184-->v192
    v192-->v194
    v194-->v199
    v197-->v199
    v199-->v203
    v203-->v211
    v203-->v209
    v209-->v211
    v211-->v215
    v211-->v281
    v211-->v337
    v215-->v218
    v218-->v220
    v220-->v222
    v222-->v224
    v220-->v232
    v220-->v230
    v230-->v232
    v232-->v243
    v232-->v241
    v241-->v243
    v243-->v254
    v243-->v252
    v252-->v254
    v254-->v265
    v254-->v263
    v263-->v265
    v265-->v276
    v265-->v274
    v274-->v276
    v276-->v340
    v281-->v284
    v284-->v286
    v287-->v289
    v286-->v297
    v286-->v295
    v295-->v297
    v297-->v303
    v297-->v314
    v303-->v311
    v303-->v309
    v309-->v311
    v311-->v315
    v315-->v323
    v315-->v321
    v321-->v323
    v323-->v333
    v323-->v331
    v331-->v333
    v333-->v340
    v337-->v340
    v340-->v342
    v342-->v350
    v342-->v348
    v348-->v350
    v350-->v354
    v350-->v396
    v354-->v358
    v358-->v360
    v360-->v369
    v360-->v367
    v367-->v369
    v369-->v380
    v369-->v378
    v378-->v380
    v380-->v391
    v380-->v389
    v389-->v391
    v391-->v397
    v398-->v402
    v402-->v404
    v404-->v413
    v404-->v411
    v411-->v413
    v413-->v424
    v413-->v422
    v422-->v424
    v424-->v435
    v424-->v433
    v433-->v435
    v435-->v441
    v441-->v451
    v441-->v449
    v449-->v451
    v451-->v456
    v456-->v458