Tfidf

Perform TF-IDF normalization of the data (typically, ATAC).

Info

ID: tfidf
Namespace: transform

TF-IDF stands for “term frequency - inverse document frequency”. It is a technique from natural language processing analysis. In the context of ATAC data, “terms” are the features (genes) and “documents” are the observations (cells). TF-IDF normalization is applied to single-cell ATAC-seq data to highlight the importance of specific genomic regions (typically peaks) across different cells while down-weighting regions that are commonly accessible across many cells.

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -main-script target/nextflow/transform/tfidf/main.nf \
  --help

Run command

Example of params.yaml
# Arguments
input: # please fill in - example: "input.h5mu"
modality: "atac"
# input_layer: "foo"
# output: "$id.$key.output"
# output_compression: "gzip"
output_layer: "tfidf"
scale_factor: 10000
log_idf: true
log_tf: true
log_tfidf: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -profile docker \
  -main-script target/nextflow/transform/tfidf/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument group

Arguments

Name Description Attributes
--input Input h5mu file file, required, example: "input.h5mu"
--modality string, default: "atac"
--input_layer Input layer to use. By default, X is normalized string
--output Output h5mu file. file, required
--output_compression The compression format to be used on the output h5mu object. string, example: "gzip"
--output_layer Output layer to use. string, default: "tfidf"
--scale_factor Scale factor to multiply the TF-IDF matrix by. integer, default: 10000
--log_idf Whether to log-transform IDF term. boolean, default: TRUE
--log_tf Whether to log-transform TF term. boolean, default: TRUE
--log_tfidf Whether to log-transform TF*IDF term (False by default). Can only be used when log_tf and log_idf are False. boolean, default: FALSE

Authors

  • Vladimir Shitov (maintainer)