Umap

UMAP (Uniform Manifold Approximation and Projection) is a manifold learning technique suitable for visualizing high-dimensional data.

Info

ID: umap
Namespace: dimred

Links

Source

Besides tending to be faster than tSNE, it optimizes the embedding such that it best reflects the topology of the data, which we represent throughout Scanpy using a neighborhood graph. tSNE, by contrast, optimizes the distribution of nearest-neighbor distances in the embedding such that these best match the distribution of distances in the high-dimensional space. We use the implementation of umap-learn [McInnes18]. For a few comparisons of UMAP with tSNE, see this preprint

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/dimred/umap/main.nf \
  --help

Run command

Example of params.yaml

# Inputs
input: # please fill in - example: "input.h5mu"
modality: "rna"
uns_neighbors: "neighbors"

# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
obsm_output: "umap"

# Arguments
min_dist: 0.5
spread: 1.0
num_components: 2
# max_iter: 123
alpha: 1.0
gamma: 1.0
negative_sample_rate: 5
init_pos: "spectral"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/dimred/umap/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name	Description	Attributes
`--input`	Input h5mu file	`file`, required, example: `"input.h5mu"`
`--modality`		`string`, default: `"rna"`
`--uns_neighbors`	The `.uns` neighbors slot as output by the `find_neighbors` component.	`string`, default: `"neighbors"`

Outputs

Name	Description	Attributes
`--output`	Output h5mu file.	`file`, required, example: `"output.h5mu"`
`--output_compression`	The compression format to be used on the output h5mu object.	`string`, example: `"gzip"`
`--obsm_output`	The pre/postfix under which to store the UMAP results.	`string`, default: `"umap"`

Arguments

Name	Description	Attributes
`--min_dist`	The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.	`double`, default: `0.5`
`--spread`	The effective scale of embedded points. In combination with `min_dist` this determines how clustered/clumped the embedded points are.	`double`, default: `1`
`--num_components`	The number of dimensions of the embedding.	`integer`, default: `2`
`--max_iter`	The number of iterations (epochs) of the optimization. Called `n_epochs` in the original UMAP. Default is set to 500 if neighbors[‘connectivities’].shape[0] <= 10000, else 200.	`integer`
`--alpha`	The initial learning rate for the embedding optimization.	`double`, default: `1`
`--gamma`	Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples.	`double`, default: `1`
`--negative_sample_rate`	The number of negative edge/1-simplex samples to use per positive edge/1-simplex sample in optimizing the low dimensional embedding.	`integer`, default: `5`
`--init_pos`	How to initialize the low dimensional embedding. Called `init` in the original UMAP. Options are: * Any key from `.obsm` * `'paga'`: positions from `paga()` * `'spectral'`: use a spectral embedding of the graph * `'random'`: assign initial embedding positions at random.	`string`, default: `"spectral"`

Authors

Dries De Maeyer (maintainer)