Harmony integration followed by KNN label transfer

Cell type annotation workflow by performing harmony integration of reference and query dataset followed by KNN label transfer.

Info

ID: harmony_knn
Namespace: workflows/annotation

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -main-script target/nextflow/workflows/annotation/harmony_knn/main.nf \
  --help

Run command

Example of params.yaml
# Query Input
id: # please fill in - example: "foo"
input: # please fill in - example: "input.h5mu"
modality: "rna"
# input_layer: "foo"
input_obs_batch_label: # please fill in - example: "sample"
# input_var_gene_names: "foo"
input_reference_gene_overlap: 100
overwrite_existing_key: false

# Reference input
reference: # please fill in - example: "reference.h5mu"
# reference_layer: "foo"
reference_obs_target: # please fill in - example: "cell_type"
# reference_var_gene_names: "foo"
reference_obs_batch_label: # please fill in - example: "sample"

# HVG subset arguments
n_hvg: 2000

# PCA options
# pca_num_components: 25

# Harmony integration options
harmony_theta: [2.0]

# Leiden clustering options
leiden_resolution: [1.0]

# Neighbor classifier arguments
knn_weights: "uniform"
knn_n_neighbors: 15

# Outputs
# output: "$id.$key.output.h5mu"
# output_obs_predictions: ["foo"]
# output_obs_probability: ["foo"]
output_obsm_integrated: "X_integrated_harmony"
# output_compression: "gzip"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments
nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -profile docker \
  -main-script target/nextflow/workflows/annotation/harmony_knn/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Query Input

Name Description Attributes
--id ID of the sample. string, required, example: "foo"
--input Input dataset consisting of the (unlabeled) query observations. The dataset is expected to be pre-processed in the same way as –reference. file, required, example: "input.h5mu"
--modality Which modality to process. Should match the modality of the –reference dataset. string, default: "rna"
--input_layer The layer of the input dataset to process if .X is not to be used. Should contain log normalized counts. string
--input_obs_batch_label The .obs field in the input (query) dataset containing the batch labels. string, required, example: "sample"
--input_var_gene_names The .var field in the input (query) dataset containing gene names; if not provided, the .var index will be used. string
--input_reference_gene_overlap The minimum number of genes present in both the reference and query datasets. integer, default: 100
--overwrite_existing_key If provided, will overwrite existing fields in the input dataset when data are copied during the reference alignment process. boolean_true

Reference input

Name Description Attributes
--reference Reference dataset consisting of the labeled observations to train the KNN classifier on. The dataset is expected to be pre-processed in the same way as the –input query dataset. file, required, example: "reference.h5mu"
--reference_layer The layer of the reference dataset to process if .X is not to be used. Should contain log normalized counts. string
--reference_obs_target The .obs key of the target cell type labels to transfer. string, required, example: "cell_type"
--reference_var_gene_names The .var field in the reference dataset containing gene names; if not provided, the .var index will be used. string
--reference_obs_batch_label The .obs field in the reference dataset containing the batch labels. string, required, example: "sample"

HVG subset arguments

Name Description Attributes
--n_hvg Number of highly variable genes to subset for. integer, default: 2000

PCA options

Name Description Attributes
--pca_num_components Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation. integer, example: 25

Harmony integration options

Name Description Attributes
--harmony_theta Diversity clustering penalty parameter. Can be set as a single value for all batch observations or as multiple values, one for each observation in the batches defined by –input_obs_batch_label. theta=0 does not encourage any diversity. Larger values of theta result in more diverse clusters.” List of double, default: 2, multiple_sep: ";"

Leiden clustering options

Name Description Attributes
--leiden_resolution Control the coarseness of the clustering. Higher values lead to more clusters. List of double, default: 1, multiple_sep: ";"

Neighbor classifier arguments

Name Description Attributes
--knn_weights Weight function used in prediction. Possible values are: uniform (all points in each neighborhood are weighted equally) or distance (weight points by the inverse of their distance) string, default: "uniform"
--knn_n_neighbors The number of neighbors to use in k-neighbor graph structure used for fast approximate nearest neighbor search with PyNNDescent. Larger values will result in more accurate search results at the cost of computation time. integer, default: 15

Outputs

Name Description Attributes
--output The query data in .h5mu format with predicted labels predicted from the classifier trained on the reference. file, required, example: "output.h5mu"
--output_obs_predictions In which .obs slots to store the predicted cell labels. If provided, must have the same length as --reference_obs_targets. If empty, will default to the reference_obs_targets combined with the "_pred" suffix. List of string, multiple_sep: ";"
--output_obs_probability In which .obs slots to store the probability of the predictions. If provided, must have the same length as --reference_obs_targets. If empty, will default to the reference_obs_targets combined with the "_probability" suffix. List of string, multiple_sep: ";"
--output_obsm_integrated In which .obsm slot to store the integrated embedding. string, default: "X_integrated_harmony"
--output_compression The compression format to be used on the output h5mu object. string, example: "gzip"

Authors

  • Dorien Roosen (author, maintainer)

  • Weiwei Schultz (contributor)

Visualisation

flowchart TB
    v0(Channel.fromList)
    v2(filter)
    v10(filter)
    v18(align_query_reference)
    v25(cross)
    v35(cross)
    v41(filter)
    v52(filter)
    v64(cross)
    v74(cross)
    v84(concat)
    v91(cross)
    v101(cross)
    v107(flatMap)
    v111(filter)
    v119(concatenate_h5mu)
    v126(cross)
    v136(cross)
    v143(filter)
    v151(highly_variable_features_scanpy)
    v158(cross)
    v168(cross)
    v174(filter)
    v182(pca)
    v189(cross)
    v199(cross)
    v205(filter)
    v213(delete_aligned_lognormalized_counts_layer)
    v220(cross)
    v230(cross)
    v236(filter)
    v244(filter)
    v259(cross)
    v269(cross)
    v275(filter)
    v429(concat)
    v284(filter)
    v299(cross)
    v309(cross)
    v318(branch)
    v345(concat)
    v330(cross)
    v340(cross)
    v349(branch)
    v376(concat)
    v361(cross)
    v371(cross)
    v380(branch)
    v407(concat)
    v392(cross)
    v402(cross)
    v414(cross)
    v424(cross)
    v436(cross)
    v446(cross)
    v453(filter)
    v464(filter)
    v476(cross)
    v486(cross)
    v496(concat)
    v503(cross)
    v513(cross)
    v522(filter)
    v530(knn)
    v537(cross)
    v547(cross)
    v555(mix)
    v558(filter)
    v588(concat)
    v566(merge)
    v573(cross)
    v583(cross)
    v595(cross)
    v602(cross)
    v614(cross)
    v621(cross)
    v625(Output)
    subgraph group_split_modalities [split_modalities]
        v57(split_modalities_component)
    end
    subgraph group_harmony_leiden_workflow [harmony_leiden_workflow]
        v252(harmonypy)
        v292(find_neighbors)
        v323(leiden)
        v354(move_obsm_to_obs)
        v385(umap)
    end
    subgraph group_split_h5mu [split_h5mu]
        v469(split_h5mu_component)
    end
    v318-->v345
    v349-->v376
    v380-->v407
    v0-->v2
    v2-->v10
    v10-->v18
    v18-->v25
    v10-->v25
    v10-->v35
    v52-->v57
    v57-->v64
    v52-->v64
    v52-->v74
    v84-->v91
    v41-->v91
    v41-->v101
    v111-->v119
    v119-->v126
    v111-->v126
    v111-->v136
    v143-->v151
    v151-->v158
    v143-->v158
    v143-->v168
    v174-->v182
    v182-->v189
    v174-->v189
    v174-->v199
    v205-->v213
    v213-->v220
    v205-->v220
    v205-->v230
    v236-->v244
    v244-->v252
    v252-->v259
    v244-->v259
    v244-->v269
    v275-->v284
    v284-->v292
    v292-->v299
    v284-->v299
    v284-->v309
    v318-->v323
    v323-->v330
    v318-->v330
    v318-->v340
    v340-->v345
    v349-->v354
    v354-->v361
    v349-->v361
    v349-->v371
    v371-->v376
    v380-->v385
    v385-->v392
    v380-->v392
    v380-->v402
    v402-->v407
    v407-->v414
    v275-->v414
    v275-->v424
    v424-->v429
    v429-->v436
    v236-->v436
    v236-->v446
    v464-->v469
    v469-->v476
    v464-->v476
    v464-->v486
    v496-->v503
    v453-->v503
    v453-->v513
    v522-->v530
    v530-->v537
    v522-->v537
    v522-->v547
    v558-->v566
    v566-->v573
    v558-->v573
    v558-->v583
    v583-->v588
    v588-->v595
    v2-->v595
    v595-->v602
    v2-->v602
    v2-->v614
    v614-->v621
    v2-->v621
    v621-->v625
    v35-->v41
    v18-->v35
    v101-->v107
    v41-->v52
    v74-->v84
    v57-->v74
    v84-->v101
    v107-->v111
    v136-->v143
    v119-->v136
    v168-->v174
    v151-->v168
    v199-->v205
    v182-->v199
    v230-->v236
    v213-->v230
    v446-->v453
    v269-->v275
    v252-->v269
    v292-->v309
    v309-->v318
    v323-->v340
    v345-->v349
    v354-->v371
    v376-->v380
    v385-->v402
    v407-->v424
    v429-->v446
    v513-->v522
    v453-->v464
    v486-->v496
    v469-->v486
    v496-->v513
    v547-->v555
    v530-->v547
    v107-->v555
    v555-->v558
    v566-->v583
    v588-->v614
    style group_split_modalities fill:#F0F0F0,stroke:#969696;
    style group_harmony_leiden_workflow fill:#F0F0F0,stroke:#969696;
    style group_split_h5mu fill:#F0F0F0,stroke:#969696;
    style v0 fill:#e3dcea,stroke:#7a4baa;
    style v2 fill:#e3dcea,stroke:#7a4baa;
    style v10 fill:#e3dcea,stroke:#7a4baa;
    style v18 fill:#e3dcea,stroke:#7a4baa;
    style v25 fill:#e3dcea,stroke:#7a4baa;
    style v35 fill:#e3dcea,stroke:#7a4baa;
    style v41 fill:#e3dcea,stroke:#7a4baa;
    style v52 fill:#e3dcea,stroke:#7a4baa;
    style v57 fill:#e3dcea,stroke:#7a4baa;
    style v64 fill:#e3dcea,stroke:#7a4baa;
    style v74 fill:#e3dcea,stroke:#7a4baa;
    style v84 fill:#e3dcea,stroke:#7a4baa;
    style v91 fill:#e3dcea,stroke:#7a4baa;
    style v101 fill:#e3dcea,stroke:#7a4baa;
    style v107 fill:#e3dcea,stroke:#7a4baa;
    style v111 fill:#e3dcea,stroke:#7a4baa;
    style v119 fill:#e3dcea,stroke:#7a4baa;
    style v126 fill:#e3dcea,stroke:#7a4baa;
    style v136 fill:#e3dcea,stroke:#7a4baa;
    style v143 fill:#e3dcea,stroke:#7a4baa;
    style v151 fill:#e3dcea,stroke:#7a4baa;
    style v158 fill:#e3dcea,stroke:#7a4baa;
    style v168 fill:#e3dcea,stroke:#7a4baa;
    style v174 fill:#e3dcea,stroke:#7a4baa;
    style v182 fill:#e3dcea,stroke:#7a4baa;
    style v189 fill:#e3dcea,stroke:#7a4baa;
    style v199 fill:#e3dcea,stroke:#7a4baa;
    style v205 fill:#e3dcea,stroke:#7a4baa;
    style v213 fill:#e3dcea,stroke:#7a4baa;
    style v220 fill:#e3dcea,stroke:#7a4baa;
    style v230 fill:#e3dcea,stroke:#7a4baa;
    style v236 fill:#e3dcea,stroke:#7a4baa;
    style v244 fill:#e3dcea,stroke:#7a4baa;
    style v252 fill:#e3dcea,stroke:#7a4baa;
    style v259 fill:#e3dcea,stroke:#7a4baa;
    style v269 fill:#e3dcea,stroke:#7a4baa;
    style v275 fill:#e3dcea,stroke:#7a4baa;
    style v429 fill:#e3dcea,stroke:#7a4baa;
    style v284 fill:#e3dcea,stroke:#7a4baa;
    style v292 fill:#e3dcea,stroke:#7a4baa;
    style v299 fill:#e3dcea,stroke:#7a4baa;
    style v309 fill:#e3dcea,stroke:#7a4baa;
    style v318 fill:#e3dcea,stroke:#7a4baa;
    style v345 fill:#e3dcea,stroke:#7a4baa;
    style v323 fill:#e3dcea,stroke:#7a4baa;
    style v330 fill:#e3dcea,stroke:#7a4baa;
    style v340 fill:#e3dcea,stroke:#7a4baa;
    style v349 fill:#e3dcea,stroke:#7a4baa;
    style v376 fill:#e3dcea,stroke:#7a4baa;
    style v354 fill:#e3dcea,stroke:#7a4baa;
    style v361 fill:#e3dcea,stroke:#7a4baa;
    style v371 fill:#e3dcea,stroke:#7a4baa;
    style v380 fill:#e3dcea,stroke:#7a4baa;
    style v407 fill:#e3dcea,stroke:#7a4baa;
    style v385 fill:#e3dcea,stroke:#7a4baa;
    style v392 fill:#e3dcea,stroke:#7a4baa;
    style v402 fill:#e3dcea,stroke:#7a4baa;
    style v414 fill:#e3dcea,stroke:#7a4baa;
    style v424 fill:#e3dcea,stroke:#7a4baa;
    style v436 fill:#e3dcea,stroke:#7a4baa;
    style v446 fill:#e3dcea,stroke:#7a4baa;
    style v453 fill:#e3dcea,stroke:#7a4baa;
    style v464 fill:#e3dcea,stroke:#7a4baa;
    style v469 fill:#e3dcea,stroke:#7a4baa;
    style v476 fill:#e3dcea,stroke:#7a4baa;
    style v486 fill:#e3dcea,stroke:#7a4baa;
    style v496 fill:#e3dcea,stroke:#7a4baa;
    style v503 fill:#e3dcea,stroke:#7a4baa;
    style v513 fill:#e3dcea,stroke:#7a4baa;
    style v522 fill:#e3dcea,stroke:#7a4baa;
    style v530 fill:#e3dcea,stroke:#7a4baa;
    style v537 fill:#e3dcea,stroke:#7a4baa;
    style v547 fill:#e3dcea,stroke:#7a4baa;
    style v555 fill:#e3dcea,stroke:#7a4baa;
    style v558 fill:#e3dcea,stroke:#7a4baa;
    style v588 fill:#e3dcea,stroke:#7a4baa;
    style v566 fill:#e3dcea,stroke:#7a4baa;
    style v573 fill:#e3dcea,stroke:#7a4baa;
    style v583 fill:#e3dcea,stroke:#7a4baa;
    style v595 fill:#e3dcea,stroke:#7a4baa;
    style v602 fill:#e3dcea,stroke:#7a4baa;
    style v614 fill:#e3dcea,stroke:#7a4baa;
    style v621 fill:#e3dcea,stroke:#7a4baa;
    style v625 fill:#e3dcea,stroke:#7a4baa;