Normalize total

Normalize counts per cell.

Info

ID: normalize_total
Namespace: transform

Links

Normalize each cell by total counts over all genes, so that every cell has the same total count after normalization. If choosing target_sum=1e6, this is CPM normalization.

If exclude_highly_expressed=True, very highly expressed genes are excluded from the computation of the normalization factor (size factor) for each cell. This is meaningful as these can strongly influence the resulting normalized values for all other genes [Weinreb17].

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/transform/normalize_total/main.nf \
  --help

Run command

Example of params.yaml

# Arguments
input: # please fill in - example: "input.h5mu"
modality: "rna"
# input_layer: "foo"
# output: "output.h5mu"
# output_compression: "gzip"
# output_layer: "foo"
# target_sum: 123
exclude_highly_expressed: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/transform/normalize_total/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument group

Arguments

Name	Description	Attributes
`--input`	Input h5mu file	`file`, required, example: `"input.h5mu"`
`--modality`		`string`, default: `"rna"`
`--input_layer`	Input layer to use. By default, X is normalized	`string`
`--output`	Output h5mu file.	`file`, required, default: `"output.h5mu"`
`--output_compression`	The compression format to be used on the output h5mu object.	`string`, example: `"gzip"`
`--output_layer`	Output layer to use. By default, use X.	`string`
`--target_sum`	If None, after normalization, each observation (cell) has a total count equal to the median of total counts for observations (cells) before normalization.	`integer`
`--exclude_highly_expressed`	Exclude (very) highly expressed genes for the computation of the normalization factor (size factor) for each cell. A gene is considered highly expressed, if it has more than max_fraction of the total counts in at least one cell. The not-excluded genes will sum up to target_sum.	`boolean_true`

Authors

Dries De Maeyer (maintainer)
Robrecht Cannoodt (contributor)