Knn

This component performs label transfer from reference to query using a K-Neirest Neighbors classifier

Info

ID: knn
Namespace: labels_transfer

Links

Source

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/labels_transfer/knn/main.nf \
  --help

Run command

Example of params.yaml

# Input dataset (query) arguments
input: # please fill in - example: "path/to/file"
modality: "rna"
# input_obsm_features: "X_scvi"

# Reference dataset arguments
# reference: "reference.h5mu"
# reference_obsm_features: "X_scvi"
reference_obs_targets: ["ann_level_1", "ann_level_2", "ann_level_3", "ann_level_4", "ann_level_5", "ann_finest_level"]

# Outputs
# output: "$id.$key.output"
# output_obs_predictions: ["foo"]
# output_obs_probability: ["foo"]
# output_compression: "gzip"

# Input dataset (query) arguments
# input_obsm_distances: "bbknn_distances"

# Reference dataset arguments
# reference_obsm_distances: "bbknn_distances"

# KNN label transfer arguments
weights: "uniform"
n_neighbors: 15

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/labels_transfer/knn/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input dataset (query) arguments

Name	Description	Attributes
`--input`	The query data to transfer the labels to. Should be a .h5mu file.	`file`, required
`--modality`	Which modality to use.	`string`, default: `"rna"`
`--input_obsm_features`	The `.obsm` key of the embedding to use for the classifier’s inference. If not provided, the `.X` slot will be used instead. Make sure that embedding was obtained in the same way as the reference embedding (e.g. by the same model or preprocessing).	`string`, example: `"X_scvi"`

Reference dataset arguments

Name	Description	Attributes
`--reference`	The reference data to train classifiers on.	`file`, example: `"reference.h5mu"`
`--reference_obsm_features`	The `.obsm` key of the embedding to use for the classifier’s training. If not provided, the `.X` slot will be used instead. Make sure that embedding was obtained in the same way as the query embedding (e.g. by the same model or preprocessing).	`string`, example: `"X_scvi"`
`--reference_obs_targets`	The `.obs` key(s) of the target labels to tranfer.	List of `string`, default: `"ann_level_1", "ann_level_2", "ann_level_3", "ann_level_4", "ann_level_5", "ann_finest_level"`, multiple_sep: `";"`

Outputs

Name	Description	Attributes
`--output`	The query data in .h5mu format with predicted labels transfered from the reference.	`file`, required
`--output_obs_predictions`	In which `.obs` slots to store the predicted information. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_pred"` suffix.	List of `string`, multiple_sep: `";"`
`--output_obs_probability`	In which `.obs` slots to store the probability of the predictions. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_probability"` suffix.	List of `string`, multiple_sep: `";"`
`--output_compression`	The compression format to be used on the output h5mu object.	`string`, example: `"gzip"`

Input dataset (query) arguments

Name	Description	Attributes
`--input_obsm_distances`	The `.obsm` key of the input (query) dataset containing pre-calculated distances. If not provided, the distances will be calculated using PyNNDescent. Make sure the distance matrix contains distances relative to the reference dataset and were obtained in the same way as the reference embedding.	`string`, example: `"bbknn_distances"`

Reference dataset arguments

Name	Description	Attributes
`--reference_obsm_distances`	The `.obsm` key of the reference dataset containing pre-calculated distances. If not provided, the distances will be calculated using PyNNDescent.	`string`, example: `"bbknn_distances"`

KNN label transfer arguments

Name	Description	Attributes
`--weights`	Weight function used in prediction. Possible values are: - `uniform` - all points in each neighborhood are weighted equally - `distance` - weight points by the inverse of their distance - `gaussian` - weight points by the sum of their Gaussian kernel similarities to each sample	`string`, default: `"uniform"`
`--n_neighbors`	The number of neighbors to use in k-neighbor graph structure used for fast approximate nearest neighbor search with PyNNDescent. Larger values will result in more accurate search results at the cost of computation time.	`integer`, default: `15`

Authors

Dorien Roosen (maintainer, author)
Vladimir Shitov (author)