Normalize total
Normalize counts per cell.
Info
ID: normalize_total
Namespace: transform
Links
Normalize each cell by total counts over all genes, so that every cell has the same total count after normalization. If choosing target_sum=1e6, this is CPM normalization.
If exclude_highly_expressed=True, very highly expressed genes are excluded from the computation of the normalization factor (size factor) for each cell. This is meaningful as these can strongly influence the resulting normalized values for all other genes [Weinreb17].
Example commands
You can run the pipeline using nextflow run
.
View help
You can use --help
as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-main-script target/nextflow/transform/normalize_total/main.nf \
--help
Run command
Example of params.yaml
# Arguments
input: # please fill in - example: "input.h5mu"
modality: "rna"
# input_layer: "foo"
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
# output_layer: "foo"
target_sum: 10000
exclude_highly_expressed: false
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-profile docker \
-main-script target/nextflow/transform/normalize_total/main.nf \
-params-file params.yaml
Note
Replace -profile docker
with -profile podman
or -profile singularity
depending on the desired backend.
Argument group
Arguments
Name | Description | Attributes |
---|---|---|
--input |
Input h5mu file | file , required, example: "input.h5mu" |
--modality |
string , default: "rna" |
|
--input_layer |
Input layer to use. By default, X is normalized | string |
--output |
Output h5mu file. | file , required, default: "output.h5mu" |
--output_compression |
The compression format to be used on the output h5mu object. | string , example: "gzip" |
--output_layer |
Output layer to use. By default, use X. | string |
--target_sum |
If None, after normalization, each observation (cell) has a total count equal to the median of total counts for observations (cells) before normalization. | integer , default: 10000 |
--exclude_highly_expressed |
Exclude (very) highly expressed genes for the computation of the normalization factor (size factor) for each cell. A gene is considered highly expressed, if it has more than max_fraction of the total counts in at least one cell. The not-excluded genes will sum up to target_sum. | boolean_true |