Scanvi
    scANVI () is a semi-supervised model for single-cell transcriptomics data.
  
Info
ID: scanvi
Namespace: annotate
Links
scANVI is an scVI extension that can leverage the cell type knowledge for a subset of the cells present in the data sets to infer the states of the rest of the cells. This component will instantiate a scANVI model from a pre-trained scVI model, integrate the data and perform label prediction
Example commands
You can run the pipeline using nextflow run.
View help
You can use --help as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/annotate/scanvi/main.nf \
  --helpRun command
Example of params.yaml
# Inputs
input: # please fill in - example: "path/to/file"
modality: "rna"
# input_layer: "foo"
# var_input: "foo"
# var_gene_names: "foo"
obs_labels: # please fill in - example: "foo"
unlabeled_category: "Unknown"
# scVI Model
scvi_model: # please fill in - example: "scvi_model.pt"
# Outputs
# output: "$id.$key.output"
# output_model: "$id.$key.output_model"
# output_compression: "gzip"
obsm_output: "X_scanvi_integrated"
obs_output_predictions: "scanvi_pred"
obs_output_probabilities: "scanvi_proba"
# scANVI training arguments
# early_stopping: true
early_stopping_monitor: "elbo_validation"
early_stopping_patience: 45
early_stopping_min_delta: 0.0
# max_epochs: 123
reduce_lr_on_plateau: true
lr_factor: 0.6
lr_patience: 30.0
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
# Argumentsnextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/annotate/scanvi/main.nf \
  -params-file params.yaml
Note
Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.
Argument groups
Inputs
| Name | Description | Attributes | 
|---|---|---|
--input | 
Input h5mu file. Note that this needs to be the exact same dataset as the –scvi_model was trained on. | file, required | 
--modality | 
string, default: "rna" | 
|
--input_layer | 
Input layer to use. If None, X is used | string | 
--var_input | 
.var column containing highly variable genes that were used to train the scVi model. By default, do not subset genes. | string | 
--var_gene_names | 
.var column containing gene names. By default, use the index. | string | 
--obs_labels | 
.obs field containing the labels | string, required | 
--unlabeled_category | 
Value in the –obs_labels field that indicates unlabeled observations | string, default: "Unknown" | 
scVI Model
| Name | Description | Attributes | 
|---|---|---|
--scvi_model | 
Pretrained SCVI reference model to initialize the SCANVI model with. | file, required, example: "scvi_model.pt" | 
Outputs
| Name | Description | Attributes | 
|---|---|---|
--output | 
Output h5mu file. | file, required | 
--output_model | 
Folder where the state of the trained model will be saved to. | file | 
--output_compression | 
The compression format to be used on the output h5mu object. | string, example: "gzip" | 
--obsm_output | 
In which .obsm slot to store the resulting integrated embedding. | string, default: "X_scanvi_integrated" | 
--obs_output_predictions | 
In which .obs slot to store the predicted labels. | string, default: "scanvi_pred" | 
--obs_output_probabilities | 
In which. obs slot to store the probabilities of the predicted labels. | string, default: "scanvi_proba" | 
scANVI training arguments
| Name | Description | Attributes | 
|---|---|---|
--early_stopping | 
Whether to perform early stopping with respect to the validation set. | boolean | 
--early_stopping_monitor | 
Metric logged during validation set epoch. | string, default: "elbo_validation" | 
--early_stopping_patience | 
Number of validation epochs with no improvement after which training will be stopped. | integer, default: 45 | 
--early_stopping_min_delta | 
Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement. | double, default: 0 | 
--max_epochs | 
Number of passes through the dataset, defaults to (20000 / number of cells) * 400 or 400; whichever is smallest. | integer | 
--reduce_lr_on_plateau | 
Whether to monitor validation loss and reduce learning rate when validation set lr_scheduler_metric plateaus. | 
boolean, default: TRUE | 
--lr_factor | 
Factor to reduce learning rate. | double, default: 0.6 | 
--lr_patience | 
Number of epochs with no improvement after which learning rate will be reduced. | double, default: 30 |