Process samples

A pipeline to analyse multiple multiomics samples.

Info

ID: process_samples
Namespace: workflows/multiomics

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -main-script target/nextflow/workflows/multiomics/process_samples/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "input.h5mu"
# rna_layer: "foo"
# prot_layer: "foo"
# gdo_layer: "foo"

# Outputs
# output: "$id.$key.output.h5mu"

# Sample ID options
add_id_to_obs: true
add_id_obs_output: "sample_id"
add_id_make_observation_keys_unique: true

# RNA filtering options
# rna_min_counts: 200
# rna_max_counts: 5000000
# rna_min_genes_per_cell: 200
# rna_max_genes_per_cell: 1500000
# rna_min_cells_per_gene: 3
# rna_min_fraction_mito: 0
# rna_max_fraction_mito: 0.2

# CITE-seq filtering options
# prot_min_counts: 3
# prot_max_counts: 5000000
# prot_min_proteins_per_cell: 200
# prot_max_proteins_per_cell: 100000000
# prot_min_cells_per_protein: 3

# GDO filtering options
# gdo_min_counts: 3
# gdo_max_counts: 5000000
# gdo_min_guides_per_cell: 200
# gdo_max_guides_per_cell: 100000000
# gdo_min_cells_per_guide: 3

# Highly variable features detection
highly_variable_features_var_output: "filter_with_hvg"
highly_variable_features_obs_batch_key: "sample_id"

# Mitochondrial Gene Detection
# var_name_mitochondrial_genes: "foo"
# obs_name_mitochondrial_fraction: "foo"
# var_gene_names: "gene_symbol"
mitochondrial_gene_regex: "^[mM][tT]-"

# QC metrics calculation options
# var_qc_metrics: ["ercc", "highly_variable"]
top_n_vars: [50, 100, 200, 500]

# PCA options
pca_overwrite: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -profile docker \
  -main-script target/nextflow/workflows/multiomics/process_samples/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--id ID of the sample. string, required, example: "foo"
--input Path to the sample. file, required, example: "input.h5mu"
--rna_layer Input layer for the gene expression modality. If not specified, .X is used. string
--prot_layer Input layer for the antibody capture modality. If not specified, .X is used. string
--gdo_layer Input layer for the guide-derived oligonucleotide (GDO) data. If not specified, .X is used. string

Outputs

Name Description Attributes
--output Destination path to the output. file, required, example: "output.h5mu"

Sample ID options

Options for adding the id to .obs on the MuData object. Having a sample id present in a requirement of several components for this pipeline.

Name Description Attributes
--add_id_to_obs Add the value passed with –id to .obs. boolean, default: TRUE
--add_id_obs_output .Obs column to add the sample IDs to. Required and only used when –add_id_to_obs is set to ‘true’ string, default: "sample_id"
--add_id_make_observation_keys_unique Join the id to the .obs index (.obs_names). Only used when –add_id_to_obs is set to ‘true’. boolean, default: TRUE

RNA filtering options

Name Description Attributes
--rna_min_counts Minimum number of counts captured per cell. integer, example: 200
--rna_max_counts Maximum number of counts captured per cell. integer, example: 5000000
--rna_min_genes_per_cell Minimum of non-zero values per cell. integer, example: 200
--rna_max_genes_per_cell Maximum of non-zero values per cell. integer, example: 1500000
--rna_min_cells_per_gene Minimum of non-zero values per gene. integer, example: 3
--rna_min_fraction_mito Minimum fraction of UMIs that are mitochondrial. double, example: 0
--rna_max_fraction_mito Maximum fraction of UMIs that are mitochondrial. double, example: 0.2

CITE-seq filtering options

Name Description Attributes
--prot_min_counts Minimum number of counts per cell. integer, example: 3
--prot_max_counts Minimum number of counts per cell. integer, example: 5000000
--prot_min_proteins_per_cell Minimum of non-zero values per cell. integer, example: 200
--prot_max_proteins_per_cell Maximum of non-zero values per cell. integer, example: 100000000
--prot_min_cells_per_protein Minimum of non-zero values per protein. integer, example: 3

GDO filtering options

Name Description Attributes
--gdo_min_counts Minimum number of counts per cell. integer, example: 3
--gdo_max_counts Minimum number of counts per cell. integer, example: 5000000
--gdo_min_guides_per_cell Minimum of non-zero values per cell. integer, example: 200
--gdo_max_guides_per_cell Maximum of non-zero values per cell. integer, example: 100000000
--gdo_min_cells_per_guide Minimum of non-zero values per guide. integer, example: 3

Highly variable features detection

Name Description Attributes
--highly_variable_features_var_output In which .var slot to store a boolean array corresponding to the highly variable genes. string, default: "filter_with_hvg"
--highly_variable_features_obs_batch_key If specified, highly-variable genes are selected within each batch separately and merged. This simple process avoids the selection of batch-specific genes and acts as a lightweight batch correction method. string, default: "sample_id"

Mitochondrial Gene Detection

Name Description Attributes
--var_name_mitochondrial_genes In which .var slot to store a boolean array corresponding the mitochondrial genes. string
--obs_name_mitochondrial_fraction When specified, write the fraction of counts originating from mitochondrial genes (based on –mitochondrial_gene_regex) to an .obs column with the specified name. Requires –var_name_mitochondrial_genes. string
--var_gene_names .var column name to be used to detect mitochondrial genes instead of .var_names (default if not set). Gene names matching with the regex value from –mitochondrial_gene_regex will be identified as a mitochondrial gene. string, example: "gene_symbol"
--mitochondrial_gene_regex Regex string that identifies mitochondrial genes from –var_gene_names. By default will detect human and mouse mitochondrial genes from a gene symbol. string, default: "^[mM][tT]-"

QC metrics calculation options

Name Description Attributes
--var_qc_metrics Keys to select a boolean (containing only True or False) column from .var. For each cell, calculate the proportion of total values for genes which are labeled ‘True’, compared to the total sum of the values for all genes. Defaults to the combined values specified for –var_name_mitochondrial_genes and –highly_variable_features_var_output. List of string, example: "ercc,highly_variable", multiple_sep: ";"
--top_n_vars Number of top vars to be used to calculate cumulative proportions. If not specified, proportions are not calculated. --top_n_vars 20,50 finds cumulative proportion to the 20th and 50th most expressed vars. List of integer, default: 50, 100, 200, 500, multiple_sep: ";"

PCA options

Name Description Attributes
--pca_overwrite Allow overwriting slots for PCA output. boolean_true

Authors

  • Dries Schaumont (author, maintainer)

Visualisation

flowchart TB
    v0(Channel.fromList)
    v20(add_id)
    v32(filter)
    v76(flatMap)
    v388(mix)
    v390(mix)
    v405(concatenate_h5mu)
    v429(filter)
    v1022(Output)
    subgraph group_split_modalities_workflow [split_modalities_workflow]
        v48(split_modalities_component)
    end
    subgraph group_rna_singlesample [rna_singlesample]
        v90(filter)
        v108(grep_annotation_column)
        v121(mix)
        v130(calculate_qc_metrics)
        v150(publish)
        v176(branch)
        v192(concat)
        v181(delimit_fraction)
        v201(rna_filter_with_counts)
        v221(rna_do_filter)
        v241(filter_with_scrublet)
    end
    subgraph group_prot_singlesample [prot_singlesample]
        v282(prot_filter_with_counts)
        v302(prot_do_filter)
    end
    subgraph group_gdo_singlesample [gdo_singlesample]
        v344(gdo_filter_with_counts)
        v364(gdo_do_filter)
    end
    subgraph group_process_batches [process_batches]
        v445(split_modalities_component)
        v473(flatMap)
        v788(mix)
        v790(mix)
        v806(merge)
        v822(branch)
        v896(concat)
        v835(pca)
        v855(find_neighbors)
        v875(umap)
        v900(branch)
        v974(concat)
        v913(pca)
        v933(find_neighbors)
        v953(umap)
        v983(publish)
        subgraph group_rna_multisample [rna_multisample]
            v496(normalize_total)
            v516(log1p)
            v536(delete_layer)
            v556(highly_variable_features_scanpy)
            v568(filter)
            v586(grep_annotation_column)
            v599(mix)
            v608(calculate_qc_metrics)
            v628(publish)
        end
        subgraph group_prot_multisample [prot_multisample]
            v681(clr)
            v693(filter)
            v711(grep_annotation_column)
            v724(mix)
            v733(calculate_qc_metrics)
            v753(publish)
        end
    end
    v176-->v192
    v388-->v390
    v788-->v790
    v822-->v896
    v900-->v974
    v176-->v181
    v0-->v20
    v20-->v32
    v32-->v48
    v48-->v76
    v76-->v90
    v90-->v108
    v108-->v121
    v90-->v121
    v121-->v130
    v130-->v150
    v150-->v176
    v181-->v192
    v192-->v201
    v201-->v221
    v221-->v241
    v241-->v388
    v76-->v282
    v282-->v302
    v302-->v388
    v76-->v344
    v344-->v364
    v364-->v388
    v76-->v390
    v390-->v405
    v405-->v429
    v429-->v445
    v445-->v473
    v473-->v496
    v496-->v516
    v516-->v536
    v536-->v556
    v556-->v568
    v568-->v586
    v586-->v599
    v568-->v599
    v599-->v608
    v608-->v628
    v628-->v788
    v473-->v681
    v681-->v693
    v693-->v711
    v711-->v724
    v693-->v724
    v724-->v733
    v733-->v753
    v753-->v788
    v473-->v790
    v790-->v806
    v806-->v822
    v822-->v835
    v835-->v855
    v855-->v875
    v875-->v896
    v896-->v900
    v900-->v913
    v913-->v933
    v933-->v953
    v953-->v974
    v974-->v983
    v983-->v1022
    style group_split_modalities_workflow fill:#F0F0F0,stroke:#969696;
    style group_rna_singlesample fill:#F0F0F0,stroke:#969696;
    style group_prot_singlesample fill:#F0F0F0,stroke:#969696;
    style group_gdo_singlesample fill:#F0F0F0,stroke:#969696;
    style group_process_batches fill:#F0F0F0,stroke:#969696;
    style group_rna_multisample fill:#D9D9D9,stroke:#737373;
    style group_prot_multisample fill:#D9D9D9,stroke:#737373;
    style v0 fill:#e3dcea,stroke:#7a4baa;
    style v20 fill:#e3dcea,stroke:#7a4baa;
    style v32 fill:#e3dcea,stroke:#7a4baa;
    style v48 fill:#e3dcea,stroke:#7a4baa;
    style v76 fill:#e3dcea,stroke:#7a4baa;
    style v90 fill:#e3dcea,stroke:#7a4baa;
    style v108 fill:#e3dcea,stroke:#7a4baa;
    style v121 fill:#e3dcea,stroke:#7a4baa;
    style v130 fill:#e3dcea,stroke:#7a4baa;
    style v150 fill:#e3dcea,stroke:#7a4baa;
    style v176 fill:#e3dcea,stroke:#7a4baa;
    style v192 fill:#e3dcea,stroke:#7a4baa;
    style v181 fill:#e3dcea,stroke:#7a4baa;
    style v201 fill:#e3dcea,stroke:#7a4baa;
    style v221 fill:#e3dcea,stroke:#7a4baa;
    style v241 fill:#e3dcea,stroke:#7a4baa;
    style v388 fill:#e3dcea,stroke:#7a4baa;
    style v282 fill:#e3dcea,stroke:#7a4baa;
    style v302 fill:#e3dcea,stroke:#7a4baa;
    style v344 fill:#e3dcea,stroke:#7a4baa;
    style v364 fill:#e3dcea,stroke:#7a4baa;
    style v390 fill:#e3dcea,stroke:#7a4baa;
    style v405 fill:#e3dcea,stroke:#7a4baa;
    style v429 fill:#e3dcea,stroke:#7a4baa;
    style v445 fill:#e3dcea,stroke:#7a4baa;
    style v473 fill:#e3dcea,stroke:#7a4baa;
    style v496 fill:#e3dcea,stroke:#7a4baa;
    style v516 fill:#e3dcea,stroke:#7a4baa;
    style v536 fill:#e3dcea,stroke:#7a4baa;
    style v556 fill:#e3dcea,stroke:#7a4baa;
    style v568 fill:#e3dcea,stroke:#7a4baa;
    style v586 fill:#e3dcea,stroke:#7a4baa;
    style v599 fill:#e3dcea,stroke:#7a4baa;
    style v608 fill:#e3dcea,stroke:#7a4baa;
    style v628 fill:#e3dcea,stroke:#7a4baa;
    style v788 fill:#e3dcea,stroke:#7a4baa;
    style v681 fill:#e3dcea,stroke:#7a4baa;
    style v693 fill:#e3dcea,stroke:#7a4baa;
    style v711 fill:#e3dcea,stroke:#7a4baa;
    style v724 fill:#e3dcea,stroke:#7a4baa;
    style v733 fill:#e3dcea,stroke:#7a4baa;
    style v753 fill:#e3dcea,stroke:#7a4baa;
    style v790 fill:#e3dcea,stroke:#7a4baa;
    style v806 fill:#e3dcea,stroke:#7a4baa;
    style v822 fill:#e3dcea,stroke:#7a4baa;
    style v896 fill:#e3dcea,stroke:#7a4baa;
    style v835 fill:#e3dcea,stroke:#7a4baa;
    style v855 fill:#e3dcea,stroke:#7a4baa;
    style v875 fill:#e3dcea,stroke:#7a4baa;
    style v900 fill:#e3dcea,stroke:#7a4baa;
    style v974 fill:#e3dcea,stroke:#7a4baa;
    style v913 fill:#e3dcea,stroke:#7a4baa;
    style v933 fill:#e3dcea,stroke:#7a4baa;
    style v953 fill:#e3dcea,stroke:#7a4baa;
    style v983 fill:#e3dcea,stroke:#7a4baa;
    style v1022 fill:#e3dcea,stroke:#7a4baa;