Totalvi leiden

Run totalVI integration followed by neighbour calculations, leiden clustering and run umap on the result.

Info

ID: totalvi_leiden
Namespace: multiomics/integration

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.12.0 -latest \
  -main-script ./workflows/multiomics/integration/totalvi_leiden/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "dataset.h5mu"
layer: "log_normalized"
modality: "rna"
prot_modality: "prot"
reference: # please fill in - example: "path/to/file"

# Outputs
# output: "$id.$key.output.h5mu"
# reference_model_path: "$id.$key.reference_model_path.reference_model_path"
# query_model_path: "$id.$key.query_model_path.query_model_path"

# General TotalVI Options
obs_batch: "sample_id"
max_epochs: 400
max_query_epochs: 200
weight_decay: 0.0
force_retrain: false
# var_input: "foo"

# TotalVI integration options RNA
rna_reference_modality: "rna"
rna_obsm_output: "X_totalvi"

# TotalVI integration options ADT
prot_reference_modality: "prot"
prot_obsm_output: "X_totalvi"

# Neighbour calculation RNA
rna_uns_neighbors: "totalvi_integration_neighbors"
rna_obsp_neighbor_distances: "totalvi_integration_distances"
rna_obsp_neighbor_connectivities: "totalvi_integration_connectivities"

# Neighbour calculation ADT
prot_uns_neighbors: "totalvi_integration_neighbors"
prot_obsp_neighbor_distances: "totalvi_integration_distances"
prot_obsp_neighbor_connectivities: "totalvi_integration_connectivities"

# Clustering options RNA
rna_obs_cluster: "totalvi_integration_leiden"
rna_leiden_resolution: [1]

# Clustering options ADT
prot_obs_cluster: "totalvi_integration_leiden"
prot_leiden_resolution: [1]

# Umap options
obsm_umap: "X_totalvi_umap"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 0.12.0 -latest \
  -profile docker \
  -main-script ./workflows/multiomics/integration/totalvi_leiden/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--id ID of the sample. string, required, example: "foo"
--input Path to the sample. file, required, example: "dataset.h5mu"
--layer use specified layer for expression values instead of the .X object from the modality. string, default: "log_normalized"
--modality Which modality to process. string, default: "rna"
--prot_modality Which modality to process. string, default: "prot"
--reference Input h5mu file with reference data to train the TOTALVI model. file, required

Outputs

Name Description Attributes
--output Destination path to the output. file, required, example: "output.h5mu"
--reference_model_path Directory with the reference model. If not exists, trained model will be saved there file, default: "totalvi_model_reference"
--query_model_path Directory, where the query model will be saved file, default: "totalvi_model_query"

General TotalVI Options

Name Description Attributes
--obs_batch .Obs column name discriminating between your batches. string, default: "sample_id"
--max_epochs Number of passes through the dataset integer, default: 400
--max_query_epochs Number of passes through the dataset, when fine-tuning model for query integer, default: 200
--weight_decay Weight decay, when fine-tuning model for query double, default: 0
--force_retrain If true, retrain the model and save it to reference_model_path boolean_true
--var_input Boolean .var column to subset data with (e.g. containing highly variable genes). By default, do not subset genes. string

TotalVI integration options RNA

Name Description Attributes
--rna_reference_modality string, default: "rna"
--rna_obsm_output In which .obsm slot to store the normalized RNA from TOTALVI. string, default: "X_totalvi"

TotalVI integration options ADT

Name Description Attributes
--prot_reference_modality Name of the modality containing proteins in the reference string, default: "prot"
--prot_obsm_output In which .obsm slot to store the normalized protein data from TOTALVI. string, default: "X_totalvi"

Neighbour calculation RNA

Name Description Attributes
--rna_uns_neighbors In which .uns slot to store various neighbor output objects. string, default: "totalvi_integration_neighbors"
--rna_obsp_neighbor_distances In which .obsp slot to store the distance matrix between the resulting neighbors. string, default: "totalvi_integration_distances"
--rna_obsp_neighbor_connectivities In which .obsp slot to store the connectivities matrix between the resulting neighbors. string, default: "totalvi_integration_connectivities"

Neighbour calculation ADT

Name Description Attributes
--prot_uns_neighbors In which .uns slot to store various neighbor output objects. string, default: "totalvi_integration_neighbors"
--prot_obsp_neighbor_distances In which .obsp slot to store the distance matrix between the resulting neighbors. string, default: "totalvi_integration_distances"
--prot_obsp_neighbor_connectivities In which .obsp slot to store the connectivities matrix between the resulting neighbors. string, default: "totalvi_integration_connectivities"

Clustering options RNA

Name Description Attributes
--rna_obs_cluster Prefix for the .obs keys under which to add the cluster labels. Newly created columns in .obs will be created from the specified value for ‘–obs_cluster’ suffixed with an underscore and one of the resolutions resolutions specified in ‘–leiden_resolution’. string, default: "totalvi_integration_leiden"
--rna_leiden_resolution Control the coarseness of the clustering. Higher values lead to more clusters. List of double, default: 1, multiple_sep: ":"

Clustering options ADT

Name Description Attributes
--prot_obs_cluster Prefix for the .obs keys under which to add the cluster labels. Newly created columns in .obs will be created from the specified value for ‘–obs_cluster’ suffixed with an underscore and one of the resolutions resolutions specified in ‘–leiden_resolution’. string, default: "totalvi_integration_leiden"
--prot_leiden_resolution Control the coarseness of the clustering. Higher values lead to more clusters. List of double, default: 1, multiple_sep: ":"

Umap options

Name Description Attributes
--obsm_umap In which .obsm slot to store the resulting UMAP embedding. string, default: "X_totalvi_umap"

Authors

  • Dries Schaumont (author)

Visualisation

flowchart LR
    v0(Input)
    v3(toSortedList)
    v5(flatMap)
    v12(totalvi)
    v14(join)
    v23(find_neighbors)
    v25(join)
    v28(filter)
    v34(leiden)
    v36(join)
    v44(move_obsm_to_obs)
    v46(join)
    v50(mix)
    v49(filter)
    v56(umap)
    v58(join)
    v67(test_wf:run_wf:test_wf:run_wf:neighbors_leiden_umap:find_neighbors:find_neighbors_process1)
    v69(join)
    v72(filter)
    v78(test_wf:run_wf:test_wf:run_wf:neighbors_leiden_umap:leiden:leiden_process1)
    v80(join)
    v88(test_wf:run_wf:test_wf:run_wf:neighbors_leiden_umap:move_obsm_to_obs:move_obsm_to_obs_process1)
    v90(join)
    v94(mix)
    v93(filter)
    v100(test_wf:run_wf:test_wf:run_wf:neighbors_leiden_umap:umap:umap_process1)
    v102(join)
    v110(publish)
    v112(join)
    v118(Output)
    v49-->v50
    v93-->v94
    v0-->v3
    v3-->v5
    v5-->v14
    v5-->v12
    v12-->v14
    v14-->v25
    v14-->v23
    v23-->v25
    v25-->v28
    v25-->v49
    v28-->v36
    v28-->v34
    v34-->v36
    v36-->v46
    v36-->v44
    v44-->v46
    v46-->v50
    v50-->v58
    v50-->v56
    v56-->v58
    v58-->v69
    v58-->v67
    v67-->v69
    v69-->v72
    v69-->v93
    v72-->v80
    v72-->v78
    v78-->v80
    v80-->v90
    v80-->v88
    v88-->v90
    v90-->v94
    v94-->v102
    v94-->v100
    v100-->v102
    v102-->v112
    v102-->v110
    v110-->v112
    v112-->v118