Umap
UMAP (Uniform Manifold Approximation and Projection) is a manifold learning technique suitable for visualizing high-dimensional data.
Info
ID: umap
Namespace: dimred
Links
Besides tending to be faster than tSNE, it optimizes the embedding such that it best reflects the topology of the data, which we represent throughout Scanpy using a neighborhood graph. tSNE, by contrast, optimizes the distribution of nearest-neighbor distances in the embedding such that these best match the distribution of distances in the high-dimensional space. We use the implementation of umap-learn [McInnes18]. For a few comparisons of UMAP with tSNE, see this preprint
Example commands
You can run the pipeline using nextflow run
.
View help
You can use --help
as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-main-script target/nextflow/dimred/umap/main.nf \
--help
Run command
Example of params.yaml
# Inputs
input: # please fill in - example: "input.h5mu"
modality: "rna"
uns_neighbors: "neighbors"
# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
obsm_output: "umap"
# Arguments
min_dist: 0.5
spread: 1.0
num_components: 2
# max_iter: 123
alpha: 1.0
gamma: 1.0
negative_sample_rate: 5
init_pos: "spectral"
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-profile docker \
-main-script target/nextflow/dimred/umap/main.nf \
-params-file params.yaml
Note
Replace -profile docker
with -profile podman
or -profile singularity
depending on the desired backend.
Argument groups
Inputs
Name | Description | Attributes |
---|---|---|
--input |
Input h5mu file | file , required, example: "input.h5mu" |
--modality |
string , default: "rna" |
|
--uns_neighbors |
The .uns neighbors slot as output by the find_neighbors component. |
string , default: "neighbors" |
Outputs
Name | Description | Attributes |
---|---|---|
--output |
Output h5mu file. | file , required, example: "output.h5mu" |
--output_compression |
The compression format to be used on the output h5mu object. | string , example: "gzip" |
--obsm_output |
The pre/postfix under which to store the UMAP results. | string , default: "umap" |
Arguments
Name | Description | Attributes |
---|---|---|
--min_dist |
The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out. | double , default: 0.5 |
--spread |
The effective scale of embedded points. In combination with min_dist this determines how clustered/clumped the embedded points are. |
double , default: 1 |
--num_components |
The number of dimensions of the embedding. | integer , default: 2 |
--max_iter |
The number of iterations (epochs) of the optimization. Called n_epochs in the original UMAP. Default is set to 500 if neighbors[‘connectivities’].shape[0] <= 10000, else 200. |
integer |
--alpha |
The initial learning rate for the embedding optimization. | double , default: 1 |
--gamma |
Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples. | double , default: 1 |
--negative_sample_rate |
The number of negative edge/1-simplex samples to use per positive edge/1-simplex sample in optimizing the low dimensional embedding. | integer , default: 5 |
--init_pos |
How to initialize the low dimensional embedding. Called init in the original UMAP. Options are: * Any key from .obsm * 'paga' : positions from paga() * 'spectral' : use a spectral embedding of the graph * 'random' : assign initial embedding positions at random. |
string , default: "spectral" |