Scarches

Performs reference mapping with scArches

Info

ID: scarches
Namespace: integrate

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -main-script target/nextflow/integrate/scarches/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
input: # please fill in - example: "path/to/file"
modality: "rna"
reference: # please fill in - example: "path/to/file"
dataset_name: "test_dataset"

# Outputs
# output: "$id.$key.output.output"
# output_compression: "gzip"
# model_output: "$id.$key.model_output.model_output"
obsm_output: "X_integrated_scanvi"

# Early stopping arguments
# early_stopping: true
early_stopping_monitor: "elbo_validation"
early_stopping_patience: 45
early_stopping_min_delta: 0.0

# Learning parameters
max_epochs: # please fill in - example: 123
reduce_lr_on_plateau: true
lr_factor: 0.6
lr_patience: 30

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -profile docker \
  -main-script target/nextflow/integrate/scarches/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--input Input h5mu file to use as a query file, required
--modality string, default: "rna"
--reference Path to the directory with reference model or a web link. For HLCA use https://zenodo.org/record/6337966/files/HLCA_reference_model.zip file, required
--dataset_name Name of query dataset to use as a batch name. If not set, name of the input file is used string, default: "test_dataset"

Outputs

Name Description Attributes
--output Output h5mu file. file, required
--output_compression The compression format to be used on the output h5mu object. string, example: "gzip"
--model_output Output directory for model file, default: "model"
--obsm_output In which .obsm slot to store the resulting integrated embedding. string, default: "X_integrated_scanvi"

Early stopping arguments

Name Description Attributes
--early_stopping Whether to perform early stopping with respect to the validation set. boolean
--early_stopping_monitor Metric logged during validation set epoch. string, default: "elbo_validation"
--early_stopping_patience Number of validation epochs with no improvement after which training will be stopped. integer, default: 45
--early_stopping_min_delta Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement. double, default: 0

Learning parameters

Name Description Attributes
--max_epochs Number of passes through the dataset, defaults to (20000 / number of cells) * 400 or 400; whichever is smallest. integer, required
--reduce_lr_on_plateau Whether to monitor validation loss and reduce learning rate when validation set lr_scheduler_metric plateaus. boolean, default: TRUE
--lr_factor Factor to reduce learning rate. double, default: 0.6
--lr_patience Number of epochs with no improvement after which learning rate will be reduced. double, default: 30

Authors

  • Vladimir Shitov