Calculate qc metrics

Add basic quality control metrics to an .h5mu file.

Info

ID: calculate_qc_metrics
Namespace: qc

The metrics are comparable to what scanpy.pp.calculate_qc_metrics output, although they have slightly different names:

Var metrics (name in this component -> name in scanpy): - pct_dropout -> pct_dropout_by_{expr_type} - num_nonzero_obs -> n_cells_by_{expr_type} - obs_mean -> mean_{expr_type} - total_counts -> total_{expr_type}

Obs metrics: - num_nonzero_vars -> n_genes_by_{expr_type} - pct_{var_qc_metrics} -> pct_{expr_type}{qc_var} - total_counts{var_qc_metrics} -> total_{expr_type}{qc_var} - pct_of_counts_in_top{top_n_vars}vars -> pct{expr_type}in_top{n}{var_type} - total_counts -> total{expr_type}

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -main-script target/nextflow/qc/calculate_qc_metrics/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
input: # please fill in - example: "input.h5mu"
modality: "rna"
# layer: "raw_counts"

# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"

# Metrics added to .obs
# var_qc_metrics: ["ercc,highly_variable,mitochondrial"]
# var_qc_metrics_fill_na_value: true
# top_n_vars: [123]
output_obs_num_nonzero_vars: "num_nonzero_vars"
output_obs_total_counts_vars: "total_counts"

# Metrics added to .var
output_var_num_nonzero_obs: "num_nonzero_obs"
output_var_total_counts_obs: "total_counts"
output_var_obs_mean: "obs_mean"
output_var_pct_dropout: "pct_dropout"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -profile docker \
  -main-script target/nextflow/qc/calculate_qc_metrics/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--input Input h5mu file file, required, example: "input.h5mu"
--modality string, default: "rna"
--layer string, example: "raw_counts"

Metrics added to .obs

Name Description Attributes
--var_qc_metrics Keys to select a boolean (containing only True or False) column from .var. For each cell, calculate the proportion of total values for genes which are labeled ‘True’, compared to the total sum of the values for all genes. List of string, example: "ercc,highly_variable,mitochondrial", multiple_sep: ";"
--var_qc_metrics_fill_na_value Fill any ‘NA’ values found in the columns specified with –var_qc_metrics to ‘True’ or ‘False’. as False. boolean
--top_n_vars Number of top vars to be used to calculate cumulative proportions. If not specified, proportions are not calculated. --top_n_vars 20;50 finds cumulative proportion to the 20th and 50th most expressed vars. List of integer, multiple_sep: ";"
--output_obs_num_nonzero_vars Name of column in .obs describing, for each observation, the number of stored values (including explicit zeroes). In other words, the name of the column that counts for each row the number of columns that contain data. string, default: "num_nonzero_vars"
--output_obs_total_counts_vars Name of the column for .obs describing, for each observation (row), the sum of the stored values in the columns. string, default: "total_counts"

Metrics added to .var

Name Description Attributes
--output_var_num_nonzero_obs Name of column describing, for each feature, the number of stored values (including explicit zeroes). In other words, the name of the column that counts for each column the number of rows that contain data. string, default: "num_nonzero_obs"
--output_var_total_counts_obs Name of the column in .var describing, for each feature (column), the sum of the stored values in the rows. string, default: "total_counts"
--output_var_obs_mean Name of the column in .obs providing the mean of the values in each row. string, default: "obs_mean"
--output_var_pct_dropout Name of the column in .obs providing for each feature the percentage of observations the feature does not appear on (i.e. is missing). Same as --num_nonzero_obs but percentage based. string, default: "pct_dropout"

Outputs

Name Description Attributes
--output Output h5mu file. file, example: "output.h5mu"
--output_compression The compression format to be used on the output h5mu object. string, example: "gzip"

Authors

  • Dries Schaumont (author)