Full pipeline

A pipeline to analyse multiple multiomics samples.

Info

ID: full_pipeline
Namespace: multiomics

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.12.0 -latest \
  -main-script ./workflows/multiomics/full_pipeline/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "input.h5mu"

# Outputs
# output: "$id.$key.output.h5mu"

# Sample ID options
add_id_to_obs: true
add_id_obs_output: "sample_id"
add_id_make_observation_keys_unique: true

# RNA filtering options
# rna_min_counts: 200
# rna_max_counts: 5000000
# rna_min_genes_per_cell: 200
# rna_max_genes_per_cell: 1500000
# rna_min_cells_per_gene: 3
# rna_min_fraction_mito: 0
# rna_max_fraction_mito: 0.2

# CITE-seq filtering options
# prot_min_counts: 3
# prot_max_counts: 5000000
# prot_min_proteins_per_cell: 200
# prot_max_proteins_per_cell: 100000000
# prot_min_cells_per_protein: 3

# Highly variable gene detection
filter_with_hvg_var_output: "filter_with_hvg"
filter_with_hvg_obs_batch_key: "sample_id"

# Mitochondrial Gene Detection
# var_name_mitochondrial_genes: "foo"
# obs_name_mitochondrial_fraction: "foo"
# var_gene_names: "gene_symbol"
mitochondrial_gene_regex: "^[mM][tT]-"

# QC metrics calculation options
# var_qc_metrics: ["ercc", "highly_variable"]
top_n_vars: [50, 100, 200, 500]

# PCA options
pca_overwrite: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 0.12.0 -latest \
  -profile docker \
  -main-script ./workflows/multiomics/full_pipeline/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--id ID of the sample. string, required, example: "foo"
--input Path to the sample. file, required, example: "input.h5mu"

Outputs

Name Description Attributes
--output Destination path to the output. file, required, example: "output.h5mu"

Sample ID options

Options for adding the id to .obs on the MuData object. Having a sample id present in a requirement of several components for this pipeline.

Name Description Attributes
--add_id_to_obs Add the value passed with –id to .obs. boolean, default: TRUE
--add_id_obs_output .Obs column to add the sample IDs to. Required and only used when –add_id_to_obs is set to ‘true’ string, default: "sample_id"
--add_id_make_observation_keys_unique Join the id to the .obs index (.obs_names). Only used when –add_id_to_obs is set to ‘true’. boolean, default: TRUE

RNA filtering options

Name Description Attributes
--rna_min_counts Minimum number of counts captured per cell. integer, example: 200
--rna_max_counts Maximum number of counts captured per cell. integer, example: 5000000
--rna_min_genes_per_cell Minimum of non-zero values per cell. integer, example: 200
--rna_max_genes_per_cell Maximum of non-zero values per cell. integer, example: 1500000
--rna_min_cells_per_gene Minimum of non-zero values per gene. integer, example: 3
--rna_min_fraction_mito Minimum fraction of UMIs that are mitochondrial. double, example: 0
--rna_max_fraction_mito Maximum fraction of UMIs that are mitochondrial. double, example: 0.2

CITE-seq filtering options

Name Description Attributes
--prot_min_counts Minimum number of counts per cell. integer, example: 3
--prot_max_counts Minimum number of counts per cell. integer, example: 5000000
--prot_min_proteins_per_cell Minimum of non-zero values per cell. integer, example: 200
--prot_max_proteins_per_cell Maximum of non-zero values per cell. integer, example: 100000000
--prot_min_cells_per_protein Minimum of non-zero values per protein. integer, example: 3

Highly variable gene detection

Name Description Attributes
--filter_with_hvg_var_output In which .var slot to store a boolean array corresponding to the highly variable genes. string, default: "filter_with_hvg"
--filter_with_hvg_obs_batch_key If specified, highly-variable genes are selected within each batch separately and merged. This simple process avoids the selection of batch-specific genes and acts as a lightweight batch correction method. string, default: "sample_id"

Mitochondrial Gene Detection

Name Description Attributes
--var_name_mitochondrial_genes In which .var slot to store a boolean array corresponding the mitochondrial genes. string
--obs_name_mitochondrial_fraction When specified, write the fraction of counts originating from mitochondrial genes (based on –mitochondrial_gene_regex) to an .obs column with the specified name. Requires –var_name_mitochondrial_genes. string
--var_gene_names .var column name to be used to detect mitochondrial genes instead of .var_names (default if not set). Gene names matching with the regex value from –mitochondrial_gene_regex will be identified as a mitochondrial gene. string, example: "gene_symbol"
--mitochondrial_gene_regex Regex string that identifies mitochondrial genes from –var_gene_names. By default will detect human and mouse mitochondrial genes from a gene symbol. string, default: "^[mM][tT]-"

QC metrics calculation options

Name Description Attributes
--var_qc_metrics Keys to select a boolean (containing only True or False) column from .var. For each cell, calculate the proportion of total values for genes which are labeled ‘True’, compared to the total sum of the values for all genes. Defaults to the combined values specified for –var_name_mitochondrial_genes and –filter_with_hvg_var_output. List of string, example: "ercc,highly_variable", multiple_sep: ","
--top_n_vars Number of top vars to be used to calculate cumulative proportions. If not specified, proportions are not calculated. --top_n_vars 20,50 finds cumulative proportion to the 20th and 50th most expressed vars. List of integer, default: 50, 100, 200, 500, multiple_sep: ","

PCA options

Name Description Attributes
--pca_overwrite Allow overwriting slots for PCA output. boolean_true

Authors

  • Dries Schaumont (author, maintainer)

Visualisation

flowchart LR
    v0(Input)
    v2(toSortedList)
    v4(flatMap)
    v7(toSortedList)
    v9(Output)
    v11(filter)
    v17(add_id)
    v19(join)
    v24(mix)
    v22(filter)
    v27(filter)
    v32(split_modalities)
    v34(join)
    v41(concat)
    v37(filter)
    v39(test_wf:run_wf:split_modalities_workflow:splitStub)
    v42(flatMap)
    v44(filter)
    v47(toSortedList)
    v49(flatMap)
    v55(filter)
    v61(grep_annotation_column)
    v63(join)
    v67(mix)
    v66(filter)
    v73(calculate_qc_metrics)
    v75(join)
    v83(publish)
    v85(join)
    v89(filter)
    v96(delimit_fraction)
    v98(join)
    v102(mix)
    v101(filter)
    v108(filter_with_counts)
    v110(join)
    v119(do_filter)
    v121(join)
    v129(filter_with_scrublet)
    v131(join)
    v200(concat)
    v134(filter)
    v137(toSortedList)
    v139(flatMap)
    v144(filter)
    v150(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:unfiltered_counts_qc_metrics_prot:grep_annotation_column:grep_annotation_column_process1)
    v152(join)
    v156(mix)
    v155(filter)
    v162(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:unfiltered_counts_qc_metrics_prot:calculate_qc_metrics:calculate_qc_metrics_process1)
    v164(join)
    v172(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:unfiltered_counts_qc_metrics_prot:publish:publish_process1)
    v174(join)
    v183(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:filter_with_counts:filter_with_counts_process1)
    v185(join)
    v193(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:do_filter:do_filter_process1)
    v195(join)
    v198(filter)
    v204(groupTuple)
    v210(concat)
    v212(join)
    v216(filter)
    v219(toSortedList)
    v221(flatMap)
    v223(toSortedList)
    v225(Output)
    v231(normalize_total)
    v233(join)
    v242(log1p)
    v244(join)
    v253(delete_layer)
    v255(join)
    v264(filter_with_hvg)
    v266(join)
    v275(rna_calculate_qc_metrics)
    v277(join)
    v341(concat)
    v282(filter)
    v285(toSortedList)
    v287(flatMap)
    v288(toSortedList)
    v290(Output)
    v296(clr)
    v298(join)
    v304(filter)
    v310(test_wf:run_wf:multisample_processing_workflow:prot_multisample:prot_qc:grep_annotation_column:grep_annotation_column_process2)
    v312(join)
    v316(mix)
    v315(filter)
    v322(test_wf:run_wf:multisample_processing_workflow:prot_multisample:prot_qc:calculate_qc_metrics:calculate_qc_metrics_process2)
    v324(join)
    v332(test_wf:run_wf:multisample_processing_workflow:prot_multisample:prot_qc:publish:publish_process2)
    v334(join)
    v338(filter)
    v343(toSortedList)
    v349(merge)
    v351(join)
    v355(filter)
    v359(toSortedList)
    v361(flatMap)
    v368(pca)
    v370(join)
    v379(find_neighbors)
    v381(join)
    v390(umap)
    v392(join)
    v398(concat)
    v397(filter)
    v399(filter)
    v403(toSortedList)
    v405(flatMap)
    v412(pca)
    v414(join)
    v423(find_neighbors)
    v425(join)
    v434(test_wf:run_wf:integration_setup_workflow:initialize_integration_prot:umap:umap_process1)
    v436(join)
    v442(concat)
    v441(filter)
    v450(test_wf:run_wf:publish:publish_process3)
    v452(join)
    v457(toSortedList)
    v459(Output)
    v41-->v42
    v66-->v67
    v101-->v102
    v155-->v156
    v287-->v288
    v315-->v316
    v397-->v398
    v398-->v399
    v398-->v441
    v441-->v442
    v0-->v2
    v2-->v4
    v4-->v7
    v7-->v9
    v4-->v11
    v4-->v22
    v11-->v19
    v11-->v17
    v17-->v19
    v19-->v24
    v22-->v24
    v24-->v27
    v24-->v37
    v27-->v34
    v27-->v32
    v32-->v34
    v34-->v41
    v37-->v39
    v39-->v41
    v42-->v44
    v42-->v134
    v42-->v198
    v44-->v47
    v47-->v49
    v49-->v55
    v49-->v66
    v55-->v63
    v55-->v61
    v61-->v63
    v63-->v67
    v67-->v75
    v67-->v73
    v73-->v75
    v75-->v85
    v75-->v83
    v83-->v85
    v85-->v89
    v85-->v101
    v89-->v98
    v89-->v96
    v96-->v98
    v98-->v102
    v102-->v110
    v102-->v108
    v108-->v110
    v110-->v121
    v110-->v119
    v119-->v121
    v121-->v131
    v121-->v129
    v129-->v131
    v131-->v200
    v134-->v137
    v137-->v139
    v139-->v144
    v139-->v155
    v144-->v152
    v144-->v150
    v150-->v152
    v152-->v156
    v156-->v164
    v156-->v162
    v162-->v164
    v164-->v174
    v164-->v172
    v172-->v174
    v174-->v185
    v174-->v183
    v183-->v185
    v185-->v195
    v185-->v193
    v193-->v195
    v195-->v200
    v198-->v200
    v200-->v204
    v204-->v212
    v204-->v210
    v210-->v212
    v212-->v216
    v212-->v282
    v212-->v338
    v216-->v219
    v219-->v221
    v221-->v223
    v223-->v225
    v221-->v233
    v221-->v231
    v231-->v233
    v233-->v244
    v233-->v242
    v242-->v244
    v244-->v255
    v244-->v253
    v253-->v255
    v255-->v266
    v255-->v264
    v264-->v266
    v266-->v277
    v266-->v275
    v275-->v277
    v277-->v341
    v282-->v285
    v285-->v287
    v288-->v290
    v287-->v298
    v287-->v296
    v296-->v298
    v298-->v304
    v298-->v315
    v304-->v312
    v304-->v310
    v310-->v312
    v312-->v316
    v316-->v324
    v316-->v322
    v322-->v324
    v324-->v334
    v324-->v332
    v332-->v334
    v334-->v341
    v338-->v341
    v341-->v343
    v343-->v351
    v343-->v349
    v349-->v351
    v351-->v355
    v351-->v397
    v355-->v359
    v359-->v361
    v361-->v370
    v361-->v368
    v368-->v370
    v370-->v381
    v370-->v379
    v379-->v381
    v381-->v392
    v381-->v390
    v390-->v392
    v392-->v398
    v399-->v403
    v403-->v405
    v405-->v414
    v405-->v412
    v412-->v414
    v414-->v425
    v414-->v423
    v423-->v425
    v425-->v436
    v425-->v434
    v434-->v436
    v436-->v442
    v442-->v452
    v442-->v450
    v450-->v452
    v452-->v457
    v457-->v459