Find neighbors

Compute a neighborhood graph of observations [McInnes18].

Info

ID: find_neighbors
Namespace: neighbors

The neighbor search efficiency of this heavily relies on UMAP [McInnes18], which also provides a method for estimating connectivities of data points - the connectivity of the manifold (method==‘umap’). If method==‘gauss’, connectivities are computed according to [Coifman05], in the adaption of [Haghverdi16].

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -main-script target/nextflow/neighbors/find_neighbors/main.nf \
  --help

Run command

Example of params.yaml
# Arguments
input: # please fill in - example: "input.h5mu"
modality: "rna"
obsm_input: "X_pca"
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
uns_output: "neighbors"
obsp_distances: "distances"
obsp_connectivities: "connectivities"
metric: "euclidean"
num_neighbors: 15
seed: 0

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -profile docker \
  -main-script target/nextflow/neighbors/find_neighbors/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument group

Arguments

Name Description Attributes
--input Input h5mu file file, required, example: "input.h5mu"
--modality string, default: "rna"
--obsm_input Which .obsm slot to use as a starting PCA embedding. string, default: "X_pca"
--output Output h5mu file containing the found neighbors. file, example: "output.h5mu"
--output_compression The compression format to be used on the output h5mu object. string, example: "gzip"
--uns_output Mandatory .uns slot to store various neighbor output objects. string, default: "neighbors"
--obsp_distances In which .obsp slot to store the distance matrix between the resulting neighbors. string, default: "distances"
--obsp_connectivities In which .obsp slot to store the connectivities matrix between the resulting neighbors. string, default: "connectivities"
--metric The distance metric to be used in the generation of the nearest neighborhood network. string, default: "euclidean"
--num_neighbors The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100. If knn is True, number of nearest neighbors to be searched. If knn is False, a Gaussian kernel width is set to the distance of the n_neighbors neighbor. integer, default: 15
--seed A random seed. integer, default: 0

Authors

  • Dries De Maeyer (maintainer)

  • Robrecht Cannoodt (contributor)