Highly variable features scanpy

Annotate highly variable features [Satija15] [Zheng17] [Stuart19].

Info

ID: highly_variable_features_scanpy
Namespace: feature_annotation

Expects logarithmized data, except when flavor=‘seurat_v3’ in which count data is expected.

Depending on flavor, this reproduces the R-implementations of Seurat [Satija15], Cell Ranger [Zheng17], and Seurat v3 [Stuart19].

For the dispersion-based methods ([Satija15] and [Zheng17]), the normalized dispersion is obtained by scaling with the mean and standard deviation of the dispersions for features falling into a given bin for mean expression of features. This means that for each bin of mean expression, highly variable features are selected.

For [Stuart19], a normalized variance for each feature is computed. First, the data are standardized (i.e., z-score normalization per feature) with a regularized standard deviation. Next, the normalized variance is computed as the variance of each feature after the transformation. Features are ranked by the normalized variance.

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.2 -latest \
  -main-script target/nextflow/feature_annotation/highly_variable_features_scanpy/main.nf \
  --help

Run command

Example of params.yaml
# Arguments
input: # please fill in - example: "input.h5mu"
modality: "rna"
# layer: "foo"
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
var_name_filter: "filter_with_hvg"
varm_name: "hvg"
flavor: "seurat"
# n_top_features: 123
min_mean: 0.0125
max_mean: 3
min_disp: 0.5
# max_disp: 123.0
span: 0.3
n_bins: 20
# obs_batch_key: "foo"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.2 -latest \
  -profile docker \
  -main-script target/nextflow/feature_annotation/highly_variable_features_scanpy/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument group

Arguments

Name Description Attributes
--input Input h5mu file file, required, example: "input.h5mu"
--modality string, default: "rna"
--layer use adata.layers[layer] for expression values instead of adata.X. string
--output Output h5mu file. file, example: "output.h5mu"
--output_compression The compression format to be used on the output h5mu object. string, example: "gzip"
--var_name_filter In which .var slot to store a boolean array corresponding to which observations should be filtered out. string, default: "filter_with_hvg"
--varm_name In which .varm slot to store additional metadata. string, default: "hvg"
--flavor Choose the flavor for identifying highly variable features. For the dispersion based methods in their default workflows, Seurat passes the cutoffs whereas Cell Ranger passes n_top_features. string, default: "seurat"
--n_top_features Number of highly-variable features to keep. Mandatory if flavor=‘seurat_v3’. integer
--min_mean If n_top_features is defined, this and all other cutoffs for the means and the normalized dispersions are ignored. Ignored if flavor=‘seurat_v3’. double, default: 0.0125
--max_mean If n_top_features is defined, this and all other cutoffs for the means and the normalized dispersions are ignored. Ignored if flavor=‘seurat_v3’. double, default: 3
--min_disp If n_top_features is defined, this and all other cutoffs for the means and the normalized dispersions are ignored. Ignored if flavor=‘seurat_v3’. double, default: 0.5
--max_disp If n_top_features is defined, this and all other cutoffs for the means and the normalized dispersions are ignored. Ignored if flavor=‘seurat_v3’. Default is +inf. double
--span The fraction of the data (cells) used when estimating the variance in the loess model fit if flavor=‘seurat_v3’. double, default: 0.3
--n_bins Number of bins for binning the mean feature expression. Normalization is done with respect to each bin. If just a single feature falls into a bin, the normalized dispersion is artificially set to 1. integer, default: 20
--obs_batch_key If specified, highly-variable features are selected within each batch separately and merged. This simple process avoids the selection of batch-specific features and acts as a lightweight batch correction method. For all flavors, features are first sorted by how many batches they are a HVG. For dispersion-based flavors ties are broken by normalized dispersion. If flavor = ‘seurat_v3’, ties are broken by the median (across batches) rank based on within-batch normalized variance. string

Authors

  • Dries De Maeyer (contributor)

  • Robrecht Cannoodt (maintainer, contributor)