Leiden scvi

Run scvi integration followed by neighbour calculations, leiden clustering and run umap on the result.

Info

ID: leiden_scvi
Namespace: multiomics/integration

Links

Source

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.10.0 -latest \
  -main-script ./workflows/multiomics/integration/scvi_leiden/main.nf \
  --help

Run command

Example of params.yaml

# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "dataset.h5mu"
layer: "log_normalized"
modality: "rna"

# Outputs
# output: "$id.$key.output.h5mu"
# output_model: "$id.$key.output_model.output_model"

# Neighbour calculation
uns_neighbors: "scvi_integration_neighbors"
obsp_neighbor_distances: "scvi_integration_distances"
obsp_neighbor_connectivities: "scvi_integration_connectivities"

# Scvi integration options
obs_batch: # please fill in - example: "foo"
obsm_output: "X_scvi_integrated"
# early_stopping: true
early_stopping_monitor: "elbo_validation"
early_stopping_patience: 45
early_stopping_min_delta: 0.0
# max_epochs: 123
reduce_lr_on_plateau: true
lr_factor: 0.6
lr_patience: 30

# Clustering options
obs_cluster: "scvi_integration_leiden"
leiden_resolution: [1]

# Umap options
obsm_umap: "X_scvi_umap"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

nextflow run openpipelines-bio/openpipeline \
  -r 0.10.0 -latest \
  -profile docker \
  -main-script ./workflows/multiomics/integration/scvi_leiden/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name	Description	Attributes
`--id`	ID of the sample.	`string`, required, example: `"foo"`
`--input`	Path to the sample.	`file`, required, example: `"dataset.h5mu"`
`--layer`	use specified layer for expression values instead of the .X object from the modality.	`string`, default: `"log_normalized"`
`--modality`	Which modality to process.	`string`, default: `"rna"`

Outputs

Name	Description	Attributes
`--output`	Destination path to the output.	`file`, required, example: `"output.h5mu"`
`--output_model`	Folder where the state of the trained model will be saved to.	`file`, required, example: `"output_dir"`

Neighbour calculation

Name	Description	Attributes
`--uns_neighbors`	In which .uns slot to store various neighbor output objects.	`string`, default: `"scvi_integration_neighbors"`
`--obsp_neighbor_distances`	In which .obsp slot to store the distance matrix between the resulting neighbors.	`string`, default: `"scvi_integration_distances"`
`--obsp_neighbor_connectivities`	In which .obsp slot to store the connectivities matrix between the resulting neighbors.	`string`, default: `"scvi_integration_connectivities"`

Scvi integration options

Name	Description	Attributes
`--obs_batch`	Column name discriminating between your batches.	`string`, required
`--obsm_output`	In which .obsm slot to store the resulting integrated embedding.	`string`, default: `"X_scvi_integrated"`
`--early_stopping`	Whether to perform early stopping with respect to the validation set.	`boolean`
`--early_stopping_monitor`	Metric logged during validation set epoch.	`string`, default: `"elbo_validation"`
`--early_stopping_patience`	Number of validation epochs with no improvement after which training will be stopped.	`integer`, default: `45`
`--early_stopping_min_delta`	Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.	`double`, default: `0`
`--max_epochs`	Number of passes through the dataset, defaults to (20000 / number of cells) * 400 or 400; whichever is smallest.	`integer`
`--reduce_lr_on_plateau`	Whether to monitor validation loss and reduce learning rate when validation set `lr_scheduler_metric` plateaus.	`boolean`, default: `TRUE`
`--lr_factor`	Factor to reduce learning rate.	`double`, default: `0.6`
`--lr_patience`	Number of epochs with no improvement after which learning rate will be reduced.	`double`, default: `30`

Clustering options

Name	Description	Attributes
`--obs_cluster`	Prefix for the .obs keys under which to add the cluster labels. Newly created columns in .obs will be created from the specified value for ‘–obs_cluster’ suffixed with an underscore and one of the resolutions resolutions specified in ‘–leiden_resolution’.	`string`, default: `"scvi_integration_leiden"`
`--leiden_resolution`	Control the coarseness of the clustering. Higher values lead to more clusters.	List of `double`, default: `1`, multiple_sep: `":"`

Umap options

Name	Description	Attributes
`--obsm_umap`	In which .obsm slot to store the resulting UMAP embedding.	`string`, default: `"X_scvi_umap"`

Authors

Dries Schaumont (author)

Visualisation

flowchart LR
    p0(Input)
    p3(toSortedList)
    p5(flatMap)
    p12(scvi)
    p14(join)
    p23(find_neighbors)
    p25(join)
    p33(leiden)
    p35(join)
    p43(umap)
    p45(join)
    p53(move_obsm_to_obs)
    p55(join)
    p62(Output)
    p0-->p3
    p3-->p5
    p5-->p14
    p5-->p12
    p12-->p14
    p14-->p25
    p14-->p23
    p23-->p25
    p25-->p35
    p25-->p33
    p33-->p35
    p35-->p45
    p35-->p43
    p43-->p45
    p45-->p55
    p45-->p53
    p53-->p55
    p55-->p62