Full pipeline

A pipeline to analyse multiple multiomics samples.

Info

ID: full_pipeline
Namespace: multiomics

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.10.0 -latest \
  -main-script ./workflows/multiomics/full_pipeline/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "input.h5mu"

# Outputs
# output: "$id.$key.output.h5mu"

# Sample ID options
add_id_to_obs: true
add_id_obs_output: "sample_id"
add_id_make_observation_keys_unique: true

# RNA filtering options
# rna_min_counts: 200
# rna_max_counts: 5000000
# rna_min_genes_per_cell: 200
# rna_max_genes_per_cell: 1500000
# rna_min_cells_per_gene: 3
# rna_min_fraction_mito: 0
# rna_max_fraction_mito: 0.2

# CITE-seq filtering options
# prot_min_counts: 3
# prot_max_counts: 5000000
# prot_min_proteins_per_cell: 200
# prot_max_proteins_per_cell: 100000000
# prot_min_cells_per_protein: 3

# Highly variable gene detection
filter_with_hvg_var_output: "filter_with_hvg"
filter_with_hvg_obs_batch_key: "sample_id"

# Mitochondrial Gene Detection
# var_name_mitochondrial_genes: "foo"
# var_gene_names: "gene_symbol"
mitochondrial_gene_regex: "^[mM][tT]-"

# QC metrics calculation options
# var_qc_metrics: ["ercc", "highly_variable"]
top_n_vars: [50, 100, 200, 500]

# PCA options
pca_overwrite: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 0.10.0 -latest \
  -profile docker \
  -main-script ./workflows/multiomics/full_pipeline/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--id ID of the sample. string, required, example: "foo"
--input Path to the sample. file, required, example: "input.h5mu"

Outputs

Name Description Attributes
--output Destination path to the output. file, required, example: "output.h5mu"

Sample ID options

Options for adding the id to .obs on the MuData object. Having a sample id present in a requirement of several components for this pipeline.

Name Description Attributes
--add_id_to_obs Add the value passed with –id to .obs. boolean, default: TRUE
--add_id_obs_output .Obs column to add the sample IDs to. Required and only used when –add_id_to_obs is set to ‘true’ string, default: "sample_id"
--add_id_make_observation_keys_unique Join the id to the .obs index (.obs_names). Only used when –add_id_to_obs is set to ‘true’. boolean, default: TRUE

RNA filtering options

Name Description Attributes
--rna_min_counts Minimum number of counts captured per cell. integer, example: 200
--rna_max_counts Maximum number of counts captured per cell. integer, example: 5000000
--rna_min_genes_per_cell Minimum of non-zero values per cell. integer, example: 200
--rna_max_genes_per_cell Maximum of non-zero values per cell. integer, example: 1500000
--rna_min_cells_per_gene Minimum of non-zero values per gene. integer, example: 3
--rna_min_fraction_mito Minimum fraction of UMIs that are mitochondrial. double, example: 0
--rna_max_fraction_mito Maximum fraction of UMIs that are mitochondrial. double, example: 0.2

CITE-seq filtering options

Name Description Attributes
--prot_min_counts Minimum number of counts per cell. integer, example: 3
--prot_max_counts Minimum number of counts per cell. integer, example: 5000000
--prot_min_proteins_per_cell Minimum of non-zero values per cell. integer, example: 200
--prot_max_proteins_per_cell Maximum of non-zero values per cell. integer, example: 100000000
--prot_min_cells_per_protein Minimum of non-zero values per protein. integer, example: 3

Highly variable gene detection

Name Description Attributes
--filter_with_hvg_var_output In which .var slot to store a boolean array corresponding to the highly variable genes. string, default: "filter_with_hvg"
--filter_with_hvg_obs_batch_key If specified, highly-variable genes are selected within each batch separately and merged. This simple process avoids the selection of batch-specific genes and acts as a lightweight batch correction method. string, default: "sample_id"

Mitochondrial Gene Detection

Name Description Attributes
--var_name_mitochondrial_genes In which .var slot to store a boolean array corresponding the mitochondrial genes. string
--var_gene_names .var column name to be used to detect mitochondrial genes instead of .var_names (default if not set). Gene names matching with the regex value from –mitochondrial_gene_regex will be identified as a mitochondrial gene. string, example: "gene_symbol"
--mitochondrial_gene_regex Regex string that identifies mitochondrial genes from –var_gene_names. By default will detect human and mouse mitochondrial genes from a gene symbol. string, default: "^[mM][tT]-"

QC metrics calculation options

Name Description Attributes
--var_qc_metrics Keys to select a boolean (containing only True or False) column from .var. For each cell, calculate the proportion of total values for genes which are labeled ‘True’, compared to the total sum of the values for all genes. Defaults to the combined values specified for –var_name_mitochondrial_genes and –filter_with_hvg_var_output. List of string, example: "ercc,highly_variable", multiple_sep: ","
--top_n_vars Number of top vars to be used to calculate cumulative proportions. If not specified, proportions are not calculated. --top_n_vars 20,50 finds cumulative proportion to the 20th and 50th most expressed vars. List of integer, default: 50, 100, 200, 500, multiple_sep: ","

PCA options

Name Description Attributes
--pca_overwrite Allow overwriting slots for PCA output. boolean_true

Authors

  • Dries Schaumont (author, maintainer)

Visualisation

flowchart LR
    p0(Input)
    p2(toSortedList)
    p4(flatMap)
    p7(toSortedList)
    p9(Output)
    p11(filter)
    p17(add_id)
    p19(join)
    p23(mix)
    p22(filter)
    p25(filter)
    p30(split_modalities)
    p32(join)
    p39(concat)
    p35(filter)
    p37(test_wf:run_wf:split_modalities_workflow:splitStub)
    p40(flatMap)
    p41(filter)
    p44(toSortedList)
    p46(flatMap)
    p53(filter_with_counts)
    p55(join)
    p63(do_filter)
    p65(join)
    p73(filter_with_scrublet)
    p75(join)
    p110(concat)
    p79(filter)
    p82(toSortedList)
    p84(flatMap)
    p91(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:filter_with_counts:filter_with_counts_process1)
    p93(join)
    p101(test_wf:run_wf:singlesample_processing_workflow:prot_singlesample:do_filter:do_filter_process1)
    p103(join)
    p108(filter)
    p112(groupTuple)
    p118(concat)
    p120(join)
    p125(filter)
    p128(toSortedList)
    p130(flatMap)
    p132(toSortedList)
    p134(Output)
    p140(normalize_total)
    p142(join)
    p150(log1p)
    p152(join)
    p160(delete_layer)
    p162(join)
    p170(filter_with_hvg)
    p172(join)
    p180(rna_calculate_qc_metrics)
    p182(join)
    p223(concat)
    p188(filter)
    p191(toSortedList)
    p193(flatMap)
    p195(toSortedList)
    p197(Output)
    p203(clr)
    p205(join)
    p213(prot_calculate_qc_metrics)
    p215(join)
    p221(filter)
    p224(toSortedList)
    p230(merge)
    p232(join)
    p235(filter)
    p239(toSortedList)
    p241(flatMap)
    p248(pca)
    p250(join)
    p258(find_neighbors)
    p260(join)
    p268(umap)
    p270(join)
    p275(concat)
    p274(filter)
    p276(filter)
    p280(toSortedList)
    p282(flatMap)
    p289(pca)
    p291(join)
    p299(find_neighbors)
    p301(join)
    p309(test_wf:run_wf:integration_setup_workflow:initialize_integration_prot:umap:umap_process1)
    p311(join)
    p316(concat)
    p315(filter)
    p322(publish)
    p324(join)
    p329(toSortedList)
    p331(Output)
    p22-->p23
    p39-->p40
    p40-->p41
    p40-->p79
    p40-->p108
    p223-->p224
    p274-->p275
    p275-->p276
    p275-->p315
    p315-->p316
    p0-->p2
    p2-->p4
    p4-->p7
    p7-->p9
    p4-->p11
    p4-->p22
    p11-->p19
    p11-->p17
    p17-->p19
    p19-->p23
    p23-->p25
    p23-->p35
    p25-->p32
    p25-->p30
    p30-->p32
    p32-->p39
    p35-->p37
    p37-->p39
    p41-->p44
    p44-->p46
    p46-->p55
    p46-->p53
    p53-->p55
    p55-->p65
    p55-->p63
    p63-->p65
    p65-->p75
    p65-->p73
    p73-->p75
    p75-->p110
    p79-->p82
    p82-->p84
    p84-->p93
    p84-->p91
    p91-->p93
    p93-->p103
    p93-->p101
    p101-->p103
    p103-->p110
    p108-->p110
    p110-->p112
    p112-->p120
    p112-->p118
    p118-->p120
    p120-->p125
    p120-->p188
    p120-->p221
    p125-->p128
    p128-->p130
    p130-->p132
    p132-->p134
    p130-->p142
    p130-->p140
    p140-->p142
    p142-->p152
    p142-->p150
    p150-->p152
    p152-->p162
    p152-->p160
    p160-->p162
    p162-->p172
    p162-->p170
    p170-->p172
    p172-->p182
    p172-->p180
    p180-->p182
    p182-->p223
    p188-->p191
    p191-->p193
    p193-->p195
    p195-->p197
    p193-->p205
    p193-->p203
    p203-->p205
    p205-->p215
    p205-->p213
    p213-->p215
    p215-->p223
    p221-->p223
    p224-->p232
    p224-->p230
    p230-->p232
    p232-->p235
    p232-->p274
    p235-->p239
    p239-->p241
    p241-->p250
    p241-->p248
    p248-->p250
    p250-->p260
    p250-->p258
    p258-->p260
    p260-->p270
    p260-->p268
    p268-->p270
    p270-->p275
    p276-->p280
    p280-->p282
    p282-->p291
    p282-->p289
    p289-->p291
    p291-->p301
    p291-->p299
    p299-->p301
    p301-->p311
    p301-->p309
    p309-->p311
    p311-->p316
    p316-->p324
    p316-->p322
    p322-->p324
    p324-->p329
    p329-->p331