Normalize total

Normalize counts per cell.

Info

ID: normalize_total
Namespace: transform

Normalize each cell by total counts over all genes, so that every cell has the same total count after normalization. If choosing target_sum=1e6, this is CPM normalization.

If exclude_highly_expressed=True, very highly expressed genes are excluded from the computation of the normalization factor (size factor) for each cell. This is meaningful as these can strongly influence the resulting normalized values for all other genes [Weinreb17].

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -main-script target/nextflow/transform/normalize_total/main.nf \
  --help

Run command

Example of params.yaml
# Arguments
input: # please fill in - example: "input.h5mu"
modality: "rna"
# input_layer: "foo"
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
# output_layer: "foo"
target_sum: 10000
exclude_highly_expressed: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -profile docker \
  -main-script target/nextflow/transform/normalize_total/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument group

Arguments

Name Description Attributes
--input Input h5mu file file, required, example: "input.h5mu"
--modality string, default: "rna"
--input_layer Input layer to use. By default, X is normalized string
--output Output h5mu file. file, required, default: "output.h5mu"
--output_compression The compression format to be used on the output h5mu object. string, example: "gzip"
--output_layer Output layer to use. By default, use X. string
--target_sum If None, after normalization, each observation (cell) has a total count equal to the median of total counts for observations (cells) before normalization. integer, default: 10000
--exclude_highly_expressed Exclude (very) highly expressed genes for the computation of the normalization factor (size factor) for each cell. A gene is considered highly expressed, if it has more than max_fraction of the total counts in at least one cell. The not-excluded genes will sum up to target_sum. boolean_true

Authors

  • Dries De Maeyer (maintainer)

  • Robrecht Cannoodt (contributor)