scANVI - scArches workflow

Cell type annotation workflow using ScanVI with scArches for reference mapping.

Info

ID: scanvi_scarches
Namespace: workflows/annotation

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -main-script target/nextflow/workflows/annotation/scanvi_scarches/main.nf \
  --help

Run command

Example of params.yaml
# Query Input
id: # please fill in - example: "foo"
input: # please fill in - example: "input.h5mu"
modality: "rna"
# layer: "foo"
input_obs_batch_label: # please fill in - example: "sample"
# input_obs_size_factor: "foo"
# input_var_gene_names: "foo"

# Reference input
reference: # please fill in - example: "reference.h5mu"
reference_obs_target: # please fill in - example: "cell_type"
reference_obs_batch_label: # please fill in - example: "sample"
# reference_obs_size_factor: "foo"
unlabeled_category: "Unknown"
# reference_var_hvg: "foo"
# reference_var_gene_names: "foo"

# scVI, scANVI and scArches training options
# early_stopping: true
early_stopping_monitor: "elbo_validation"
early_stopping_patience: 45
early_stopping_min_delta: 0.0
# max_epochs: 123
reduce_lr_on_plateau: true
lr_factor: 0.6
lr_patience: 30.0

# Leiden clustering options
leiden_resolution: [1.0]

# Neighbor classifier arguments
knn_weights: "uniform"
knn_n_neighbors: 15

# Outputs
# output: "$id.$key.output.h5mu"
output_obs_predictions: "scanvi_pred"
output_obs_probability: "scanvi_probabilities"
output_obsm_integrated: "X_integrated_scanvi"
# output_compression: "gzip"
# output_model: "$id.$key.output_model"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments
nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -profile docker \
  -main-script target/nextflow/workflows/annotation/scanvi_scarches/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Query Input

Name Description Attributes
--id ID of the sample. string, required, example: "foo"
--input Input dataset consisting of the (unlabeled) query observations. The dataset is expected to be pre-processed in the same way as –reference. file, required, example: "input.h5mu"
--modality Which modality to process. Should match the modality of the –reference dataset. string, default: "rna"
--layer Which layer to use for integration if .X is not to be used. Should match the layer of the –reference dataset. string
--input_obs_batch_label The .obs field in the input (query) dataset containing the batch labels. string, required, example: "sample"
--input_obs_size_factor Key in adata.obs for size factor information. Instead of using library size as a size factor, the provided size factor column will be used as offset in the mean of the likelihood. Assumed to be on linear scale. string
--input_var_gene_names .var column containing gene names. By default, use the index. string

Reference input

Name Description Attributes
--reference Reference dataset consisting of the labeled observations to train the KNN classifier on. The dataset is expected to be pre-processed in the same way as the –input query dataset. file, required, example: "reference.h5mu"
--reference_obs_target The .obs key containing the target labels. string, required, example: "cell_type"
--reference_obs_batch_label The .obs field in the reference dataset containing the batch labels. string, required, example: "sample"
--reference_obs_size_factor Key in adata.obs for size factor information. Instead of using library size as a size factor, the provided size factor column will be used as offset in the mean of the likelihood. Assumed to be on linear scale. string
--unlabeled_category Value in the –reference_obs_batch_label field that indicates unlabeled observations string, default: "Unknown"
--reference_var_hvg .var column containing highly variable genes. If not provided, genes will not be subset. string
--reference_var_gene_names .var column containing gene names. By default, use the index. string

scVI, scANVI and scArches training options

Name Description Attributes
--early_stopping Whether to perform early stopping with respect to the validation set. boolean
--early_stopping_monitor Metric logged during validation set epoch. string, default: "elbo_validation"
--early_stopping_patience Number of validation epochs with no improvement after which training will be stopped. integer, default: 45
--early_stopping_min_delta Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement. double, default: 0
--max_epochs Number of passes through the dataset, defaults to (20000 / number of cells) * 400 or 400; whichever is smallest. integer
--reduce_lr_on_plateau Whether to monitor validation loss and reduce learning rate when validation set lr_scheduler_metric plateaus. boolean, default: TRUE
--lr_factor Factor to reduce learning rate. double, default: 0.6
--lr_patience Number of epochs with no improvement after which learning rate will be reduced. double, default: 30

Leiden clustering options

Name Description Attributes
--leiden_resolution Control the coarseness of the clustering. Higher values lead to more clusters. List of double, default: 1, multiple_sep: ";"

Neighbor classifier arguments

Name Description Attributes
--knn_weights Weight function used in prediction. Possible values are: uniform (all points in each neighborhood are weighted equally) or distance (weight points by the inverse of their distance) string, default: "uniform"
--knn_n_neighbors The number of neighbors to use in k-neighbor graph structure used for fast approximate nearest neighbor search with PyNNDescent. Larger values will result in more accurate search results at the cost of computation time. integer, default: 15

Outputs

Name Description Attributes
--output The query data in .h5mu format with predicted labels predicted from the classifier trained on the reference. file, required, example: "output.h5mu"
--output_obs_predictions In which .obs slot to store the predicted labels. string, default: "scanvi_pred"
--output_obs_probability In which. obs slot to store the probabilities of the predicted labels. string, default: "scanvi_probabilities"
--output_obsm_integrated In which .obsm slot to store the integrated embedding. string, default: "X_integrated_scanvi"
--output_compression The compression format to be used on the output h5mu object. string, example: "gzip"
--output_model Path to the resulting scANVI model that was updated with query data. file

Authors

  • Dorien Roosen (author, maintainer)

  • Weiwei Schultz (contributor)

Visualisation

flowchart TB
    v0(Channel.fromList)
    v2(filter)
    v10(filter)
    v18(scvi)
    v25(cross)
    v35(cross)
    v41(filter)
    v49(scanvi)
    v56(cross)
    v66(cross)
    v72(filter)
    v80(scarches)
    v87(cross)
    v97(cross)
    v104(filter)
    v258(concat)
    v113(filter)
    v128(cross)
    v138(cross)
    v147(branch)
    v174(concat)
    v159(cross)
    v169(cross)
    v178(branch)
    v205(concat)
    v190(cross)
    v200(cross)
    v209(branch)
    v236(concat)
    v221(cross)
    v231(cross)
    v243(cross)
    v253(cross)
    v265(cross)
    v272(cross)
    v284(cross)
    v291(cross)
    v295(Output)
    subgraph group_neighbors_leiden_umap [neighbors_leiden_umap]
        v121(find_neighbors)
        v152(leiden)
        v183(move_obsm_to_obs)
        v214(umap)
    end
    v147-->v174
    v178-->v205
    v209-->v236
    v0-->v2
    v2-->v10
    v10-->v18
    v18-->v25
    v10-->v25
    v10-->v35
    v41-->v49
    v49-->v56
    v41-->v56
    v41-->v66
    v72-->v80
    v80-->v87
    v72-->v87
    v72-->v97
    v104-->v113
    v113-->v121
    v121-->v128
    v113-->v128
    v113-->v138
    v147-->v152
    v152-->v159
    v147-->v159
    v147-->v169
    v169-->v174
    v178-->v183
    v183-->v190
    v178-->v190
    v178-->v200
    v200-->v205
    v209-->v214
    v214-->v221
    v209-->v221
    v209-->v231
    v231-->v236
    v236-->v243
    v104-->v243
    v104-->v253
    v253-->v258
    v258-->v265
    v2-->v265
    v265-->v272
    v2-->v272
    v2-->v284
    v284-->v291
    v2-->v291
    v291-->v295
    v35-->v41
    v18-->v35
    v66-->v72
    v49-->v66
    v97-->v104
    v80-->v97
    v121-->v138
    v138-->v147
    v152-->v169
    v174-->v178
    v183-->v200
    v205-->v209
    v214-->v231
    v236-->v253
    v258-->v284
    style group_neighbors_leiden_umap fill:#F0F0F0,stroke:#969696;
    style v0 fill:#e3dcea,stroke:#7a4baa;
    style v2 fill:#e3dcea,stroke:#7a4baa;
    style v10 fill:#e3dcea,stroke:#7a4baa;
    style v18 fill:#e3dcea,stroke:#7a4baa;
    style v25 fill:#e3dcea,stroke:#7a4baa;
    style v35 fill:#e3dcea,stroke:#7a4baa;
    style v41 fill:#e3dcea,stroke:#7a4baa;
    style v49 fill:#e3dcea,stroke:#7a4baa;
    style v56 fill:#e3dcea,stroke:#7a4baa;
    style v66 fill:#e3dcea,stroke:#7a4baa;
    style v72 fill:#e3dcea,stroke:#7a4baa;
    style v80 fill:#e3dcea,stroke:#7a4baa;
    style v87 fill:#e3dcea,stroke:#7a4baa;
    style v97 fill:#e3dcea,stroke:#7a4baa;
    style v104 fill:#e3dcea,stroke:#7a4baa;
    style v258 fill:#e3dcea,stroke:#7a4baa;
    style v113 fill:#e3dcea,stroke:#7a4baa;
    style v121 fill:#e3dcea,stroke:#7a4baa;
    style v128 fill:#e3dcea,stroke:#7a4baa;
    style v138 fill:#e3dcea,stroke:#7a4baa;
    style v147 fill:#e3dcea,stroke:#7a4baa;
    style v174 fill:#e3dcea,stroke:#7a4baa;
    style v152 fill:#e3dcea,stroke:#7a4baa;
    style v159 fill:#e3dcea,stroke:#7a4baa;
    style v169 fill:#e3dcea,stroke:#7a4baa;
    style v178 fill:#e3dcea,stroke:#7a4baa;
    style v205 fill:#e3dcea,stroke:#7a4baa;
    style v183 fill:#e3dcea,stroke:#7a4baa;
    style v190 fill:#e3dcea,stroke:#7a4baa;
    style v200 fill:#e3dcea,stroke:#7a4baa;
    style v209 fill:#e3dcea,stroke:#7a4baa;
    style v236 fill:#e3dcea,stroke:#7a4baa;
    style v214 fill:#e3dcea,stroke:#7a4baa;
    style v221 fill:#e3dcea,stroke:#7a4baa;
    style v231 fill:#e3dcea,stroke:#7a4baa;
    style v243 fill:#e3dcea,stroke:#7a4baa;
    style v253 fill:#e3dcea,stroke:#7a4baa;
    style v265 fill:#e3dcea,stroke:#7a4baa;
    style v272 fill:#e3dcea,stroke:#7a4baa;
    style v284 fill:#e3dcea,stroke:#7a4baa;
    style v291 fill:#e3dcea,stroke:#7a4baa;
    style v295 fill:#e3dcea,stroke:#7a4baa;