Tsne

t-SNE (t-Distributed Stochastic Neighbor Embedding) is a dimensionality reduction technique used to visualize high-dimensional data in a low-dimensional space, revealing patterns and clusters by preserving local data similarities

Info

ID: tsne
Namespace: dimred

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.2 -latest \
  -main-script target/nextflow/dimred/tsne/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
input: # please fill in - example: "input.h5mu"
modality: # please fill in - example: "rna"
use_rep: # please fill in - example: "X_pca"

# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
obsm_output: "X_tsne"

# Arguments
n_pcs: 50
perplexity: 30.0
min_dist: 0.5
metric: "euclidean"
early_exaggeration: 12.0
learning_rate: 1000.0
random_state: 0

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.2 -latest \
  -profile docker \
  -main-script target/nextflow/dimred/tsne/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--input Input h5mu file file, required, example: "input.h5mu"
--modality string, required, default: "rna"
--use_rep The .obsm slot to use as input for the tSNE computation. string, required, example: "X_pca"

Outputs

Name Description Attributes
--output Output h5mu file. file, required, example: "output.h5mu"
--output_compression The compression format to be used on the output h5mu object. string, example: "gzip"
--obsm_output The .obsm key to use for storing the tSNE results. string, default: "X_tsne"

Arguments

Name Description Attributes
--n_pcs The number of principal components to use for the tSNE computation. integer, default: 50
--perplexity The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. double, default: 30
--min_dist The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out. double, default: 0.5
--metric Distance metric to calculate neighbors on. string, default: "euclidean"
--early_exaggeration Controls how tight natural clusters in the original space are in the embedded space and how much space will be between them. For larger values, the space between natural clusters will be larger in the embedded space. Again, the choice of this parameter is not very critical. If the cost function increases during initial optimization, the early exaggeration factor or the learning rate might be too high. double, default: 12
--learning_rate The learning rate for t-SNE optimization. Typical values range between 10.0 and 1000.0. double, default: 1000
--random_state The random seed to use for the tSNE computation. integer, default: 0

Authors

  • Jakub Majercik (maintainer)