Scanvi

scANVI () is a semi-supervised model for single-cell transcriptomics data.

Info

ID: scanvi
Namespace: annotate

Links

scANVI is an scVI extension that can leverage the cell type knowledge for a subset of the cells present in the data sets to infer the states of the rest of the cells. This component will instantiate a scANVI model from a pre-trained scVI model, integrate the data and perform label prediction

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/annotate/scanvi/main.nf \
  --help

Run command

Example of params.yaml

# Inputs
input: # please fill in - example: "path/to/file"
modality: "rna"
# input_layer: "foo"
# var_input: "foo"
# var_gene_names: "foo"
obs_labels: # please fill in - example: "foo"
unlabeled_category: "Unknown"

# scVI Model
scvi_model: # please fill in - example: "scvi_model.pt"

# Outputs
# output: "$id.$key.output"
# output_model: "$id.$key.output_model"
# output_compression: "gzip"
obsm_output: "X_scanvi_integrated"
obs_output_predictions: "scanvi_pred"
obs_output_probabilities: "scanvi_proba"

# scANVI training arguments
# early_stopping: true
early_stopping_monitor: "elbo_validation"
early_stopping_patience: 45
early_stopping_min_delta: 0.0
# max_epochs: 123
reduce_lr_on_plateau: true
lr_factor: 0.6
lr_patience: 30.0

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/annotate/scanvi/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name	Description	Attributes
`--input`	Input h5mu file. Note that this needs to be the exact same dataset as the –scvi_model was trained on.	`file`, required
`--modality`		`string`, default: `"rna"`
`--input_layer`	Input layer to use. If None, X is used	`string`
`--var_input`	.var column containing highly variable genes that were used to train the scVi model. By default, do not subset genes.	`string`
`--var_gene_names`	.var column containing gene names. By default, use the index.	`string`
`--obs_labels`	.obs field containing the labels	`string`, required
`--unlabeled_category`	Value in the –obs_labels field that indicates unlabeled observations	`string`, default: `"Unknown"`

scVI Model

Name	Description	Attributes
`--scvi_model`	Pretrained SCVI reference model to initialize the SCANVI model with.	`file`, required, example: `"scvi_model.pt"`

Outputs

Name	Description	Attributes
`--output`	Output h5mu file.	`file`, required
`--output_model`	Folder where the state of the trained model will be saved to.	`file`
`--output_compression`	The compression format to be used on the output h5mu object.	`string`, example: `"gzip"`
`--obsm_output`	In which .obsm slot to store the resulting integrated embedding.	`string`, default: `"X_scanvi_integrated"`
`--obs_output_predictions`	In which .obs slot to store the predicted labels.	`string`, default: `"scanvi_pred"`
`--obs_output_probabilities`	In which. obs slot to store the probabilities of the predicted labels.	`string`, default: `"scanvi_proba"`

scANVI training arguments

Name	Description	Attributes
`--early_stopping`	Whether to perform early stopping with respect to the validation set.	`boolean`
`--early_stopping_monitor`	Metric logged during validation set epoch.	`string`, default: `"elbo_validation"`
`--early_stopping_patience`	Number of validation epochs with no improvement after which training will be stopped.	`integer`, default: `45`
`--early_stopping_min_delta`	Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.	`double`, default: `0`
`--max_epochs`	Number of passes through the dataset, defaults to (20000 / number of cells) * 400 or 400; whichever is smallest.	`integer`
`--reduce_lr_on_plateau`	Whether to monitor validation loss and reduce learning rate when validation set `lr_scheduler_metric` plateaus.	`boolean`, default: `TRUE`
`--lr_factor`	Factor to reduce learning rate.	`double`, default: `0.6`
`--lr_patience`	Number of epochs with no improvement after which learning rate will be reduced.	`double`, default: `30`

Authors

Dorien Roosen (maintainer)
Jakub Majercik (author)
Weiwei Schultz (contributor)