Totalvi leiden

Run totalVI integration followed by neighbour calculations, leiden clustering and run umap on the result.

Info

ID: totalvi_leiden
Namespace: multiomics/integration

Links

Source

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.10.0 -latest \
  -main-script ./workflows/multiomics/integration/totalvi_leiden/main.nf \
  --help

Run command

Example of params.yaml

# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "dataset.h5mu"
layer: "log_normalized"
modality: "rna"
prot_modality: "prot"
reference: # please fill in - example: "path/to/file"

# Outputs
# output: "$id.$key.output.h5mu"
# reference_model_path: "$id.$key.reference_model_path.reference_model_path"
# query_model_path: "$id.$key.query_model_path.query_model_path"

# General TotalVI Options
obs_batch: "sample_id"
max_epochs: 400
max_query_epochs: 200
weight_decay: 0.0
force_retrain: false
# var_input: "foo"

# TotalVI integration options RNA
rna_reference_modality: "rna"
rna_obsm_output: "X_totalvi"

# TotalVI integration options ADT
prot_reference_modality: "prot"
prot_obsm_output: "X_totalvi"

# Neighbour calculation RNA
rna_uns_neighbors: "totalvi_integration_neighbors"
rna_obsp_neighbor_distances: "totalvi_integration_distances"
rna_obsp_neighbor_connectivities: "totalvi_integration_connectivities"

# Neighbour calculation ADT
prot_uns_neighbors: "totalvi_integration_neighbors"
prot_obsp_neighbor_distances: "totalvi_integration_distances"
prot_obsp_neighbor_connectivities: "totalvi_integration_connectivities"

# Clustering options RNA
rna_obs_cluster: "totalvi_integration_leiden"
rna_leiden_resolution: [1]

# Clustering options ADT
prot_obs_cluster: "totalvi_integration_leiden"
prot_leiden_resolution: [1]

# Umap options
obsm_umap: "X_totalvi_umap"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

nextflow run openpipelines-bio/openpipeline \
  -r 0.10.0 -latest \
  -profile docker \
  -main-script ./workflows/multiomics/integration/totalvi_leiden/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name	Description	Attributes
`--id`	ID of the sample.	`string`, required, example: `"foo"`
`--input`	Path to the sample.	`file`, required, example: `"dataset.h5mu"`
`--layer`	use specified layer for expression values instead of the .X object from the modality.	`string`, default: `"log_normalized"`
`--modality`	Which modality to process.	`string`, default: `"rna"`
`--prot_modality`	Which modality to process.	`string`, default: `"prot"`
`--reference`	Input h5mu file with reference data to train the TOTALVI model.	`file`, required

Outputs

Name	Description	Attributes
`--output`	Destination path to the output.	`file`, required, example: `"output.h5mu"`
`--reference_model_path`	Directory with the reference model. If not exists, trained model will be saved there	`file`, default: `"totalvi_model_reference"`
`--query_model_path`	Directory, where the query model will be saved	`file`, default: `"totalvi_model_query"`

General TotalVI Options

Name	Description	Attributes
`--obs_batch`	.Obs column name discriminating between your batches.	`string`, default: `"sample_id"`
`--max_epochs`	Number of passes through the dataset	`integer`, default: `400`
`--max_query_epochs`	Number of passes through the dataset, when fine-tuning model for query	`integer`, default: `200`
`--weight_decay`	Weight decay, when fine-tuning model for query	`double`, default: `0`
`--force_retrain`	If true, retrain the model and save it to reference_model_path	`boolean_true`
`--var_input`	Boolean .var column to subset data with (e.g. containing highly variable genes). By default, do not subset genes.	`string`

TotalVI integration options RNA

Name	Description	Attributes
`--rna_reference_modality`		`string`, default: `"rna"`
`--rna_obsm_output`	In which .obsm slot to store the normalized RNA from TOTALVI.	`string`, default: `"X_totalvi"`

TotalVI integration options ADT

Name	Description	Attributes
`--prot_reference_modality`	Name of the modality containing proteins in the reference	`string`, default: `"prot"`
`--prot_obsm_output`	In which .obsm slot to store the normalized protein data from TOTALVI.	`string`, default: `"X_totalvi"`

Neighbour calculation RNA

Name	Description	Attributes
`--rna_uns_neighbors`	In which .uns slot to store various neighbor output objects.	`string`, default: `"totalvi_integration_neighbors"`
`--rna_obsp_neighbor_distances`	In which .obsp slot to store the distance matrix between the resulting neighbors.	`string`, default: `"totalvi_integration_distances"`
`--rna_obsp_neighbor_connectivities`	In which .obsp slot to store the connectivities matrix between the resulting neighbors.	`string`, default: `"totalvi_integration_connectivities"`

Neighbour calculation ADT

Name	Description	Attributes
`--prot_uns_neighbors`	In which .uns slot to store various neighbor output objects.	`string`, default: `"totalvi_integration_neighbors"`
`--prot_obsp_neighbor_distances`	In which .obsp slot to store the distance matrix between the resulting neighbors.	`string`, default: `"totalvi_integration_distances"`
`--prot_obsp_neighbor_connectivities`	In which .obsp slot to store the connectivities matrix between the resulting neighbors.	`string`, default: `"totalvi_integration_connectivities"`

Clustering options RNA

Name	Description	Attributes
`--rna_obs_cluster`	Prefix for the .obs keys under which to add the cluster labels. Newly created columns in .obs will be created from the specified value for ‘–obs_cluster’ suffixed with an underscore and one of the resolutions resolutions specified in ‘–leiden_resolution’.	`string`, default: `"totalvi_integration_leiden"`
`--rna_leiden_resolution`	Control the coarseness of the clustering. Higher values lead to more clusters.	List of `double`, default: `1`, multiple_sep: `":"`

Clustering options ADT

Name	Description	Attributes
`--prot_obs_cluster`	Prefix for the .obs keys under which to add the cluster labels. Newly created columns in .obs will be created from the specified value for ‘–obs_cluster’ suffixed with an underscore and one of the resolutions resolutions specified in ‘–leiden_resolution’.	`string`, default: `"totalvi_integration_leiden"`
`--prot_leiden_resolution`	Control the coarseness of the clustering. Higher values lead to more clusters.	List of `double`, default: `1`, multiple_sep: `":"`

Umap options

Name	Description	Attributes
`--obsm_umap`	In which .obsm slot to store the resulting UMAP embedding.	`string`, default: `"X_totalvi_umap"`

Authors

Dries Schaumont (author)

Visualisation

flowchart LR
    p0(Input)
    p3(toSortedList)
    p5(flatMap)
    p12(totalvi)
    p14(join)
    p23(find_neighbors)
    p25(join)
    p33(leiden)
    p35(join)
    p43(umap)
    p45(join)
    p53(move_obsm_to_obs)
    p55(join)
    p64(test_wf:run_wf:test_wf:run_wf:neighbors_leiden_umap:find_neighbors:find_neighbors_process1)
    p66(join)
    p74(test_wf:run_wf:test_wf:run_wf:neighbors_leiden_umap:leiden:leiden_process1)
    p76(join)
    p84(test_wf:run_wf:test_wf:run_wf:neighbors_leiden_umap:umap:umap_process1)
    p86(join)
    p94(test_wf:run_wf:test_wf:run_wf:neighbors_leiden_umap:move_obsm_to_obs:move_obsm_to_obs_process1)
    p96(join)
    p104(publish)
    p106(join)
    p112(Output)
    p0-->p3
    p3-->p5
    p5-->p14
    p5-->p12
    p12-->p14
    p14-->p25
    p14-->p23
    p23-->p25
    p25-->p35
    p25-->p33
    p33-->p35
    p35-->p45
    p35-->p43
    p43-->p45
    p45-->p55
    p45-->p53
    p53-->p55
    p55-->p66
    p55-->p64
    p64-->p66
    p66-->p76
    p66-->p74
    p74-->p76
    p76-->p86
    p76-->p84
    p84-->p86
    p86-->p96
    p86-->p94
    p94-->p96
    p96-->p106
    p96-->p104
    p104-->p106
    p106-->p112