Scanorama leiden
Run scanorama integration followed by neighbour calculations, leiden clustering and run umap on the result.
Info
ID: scanorama_leiden
Namespace: multiomics/integration
Links
Example commands
You can run the pipeline using nextflow run.
View help
You can use --help as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 0.10.0 -latest \
-main-script ./workflows/multiomics/integration/scanorama_leiden/main.nf \
--helpRun command
Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "dataset.h5mu"
layer: "log_normalized"
modality: "rna"
# Outputs
# output: "$id.$key.output.h5mu"
# Neighbour calculation
uns_neighbors: "scanorama_integration_neighbors"
obsp_neighbor_distances: "scanorama_integration_distances"
obsp_neighbor_connectivities: "scanorama_integration_connectivities"
# Scanorama integration options
obs_batch: "sample_id"
obsm_input: "X_pca"
obsm_output: "X_scanorama"
knn: 20
batch_size: 5000
sigma: 15
approx: true
alpha: 0.1
# Clustering options
obs_cluster: "scanorama_integration_leiden"
leiden_resolution: [1]
# Umap options
obsm_umap: "X_leiden_scanorama_umap"
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"nextflow run openpipelines-bio/openpipeline \
-r 0.10.0 -latest \
-profile docker \
-main-script ./workflows/multiomics/integration/scanorama_leiden/main.nf \
-params-file params.yaml
Note
Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.
Argument groups
Inputs
| Name | Description | Attributes |
|---|---|---|
--id |
ID of the sample. | string, required, example: "foo" |
--input |
Path to the sample. | file, required, example: "dataset.h5mu" |
--layer |
use specified layer for expression values instead of the .X object from the modality. | string, default: "log_normalized" |
--modality |
Which modality to process. | string, default: "rna" |
Outputs
| Name | Description | Attributes |
|---|---|---|
--output |
Destination path to the output. | file, required, example: "output.h5mu" |
Neighbour calculation
| Name | Description | Attributes |
|---|---|---|
--uns_neighbors |
In which .uns slot to store various neighbor output objects. | string, default: "scanorama_integration_neighbors" |
--obsp_neighbor_distances |
In which .obsp slot to store the distance matrix between the resulting neighbors. | string, default: "scanorama_integration_distances" |
--obsp_neighbor_connectivities |
In which .obsp slot to store the connectivities matrix between the resulting neighbors. | string, default: "scanorama_integration_connectivities" |
Scanorama integration options
| Name | Description | Attributes |
|---|---|---|
--obs_batch |
Column name discriminating between your batches. | string, default: "sample_id" |
--obsm_input |
.obsm slot that points to embedding to run scanorama on. | string, default: "X_pca" |
--obsm_output |
The name of the field in adata.obsm where the integrated embeddings will be stored after running this function. Defaults to X_scanorama. | string, default: "X_scanorama" |
--knn |
Number of nearest neighbors to use for matching. | integer, default: 20 |
--batch_size |
The batch size used in the alignment vector computation. Useful when integrating very large (>100k samples) datasets. Set to large value that runs within available memory. | integer, default: 5000 |
--sigma |
Correction smoothing parameter on Gaussian kernel. | double, default: 15 |
--approx |
Use approximate nearest neighbors with Python annoy; greatly speeds up matching runtime. | boolean, default: TRUE |
--alpha |
Alignment score minimum cutoff | double, default: 0.1 |
Clustering options
| Name | Description | Attributes |
|---|---|---|
--obs_cluster |
Prefix for the .obs keys under which to add the cluster labels. Newly created columns in .obs will be created from the specified value for ‘–obs_cluster’ suffixed with an underscore and one of the resolutions specified in ‘–leiden_resolution’. | string, default: "scanorama_integration_leiden" |
--leiden_resolution |
Control the coarseness of the clustering. Higher values lead to more clusters. | List of double, default: 1, multiple_sep: ":" |
Umap options
| Name | Description | Attributes |
|---|---|---|
--obsm_umap |
In which .obsm slot to store the resulting UMAP embedding. | string, default: "X_leiden_scanorama_umap" |
Visualisation
flowchart LR
p0(Input)
p2(toSortedList)
p4(flatMap)
p11(scanorama)
p13(join)
p21(find_neighbors)
p23(join)
p31(leiden)
p33(join)
p41(umap)
p43(join)
p51(move_obsm_to_obs)
p53(join)
p60(Output)
p0-->p2
p2-->p4
p4-->p13
p4-->p11
p11-->p13
p13-->p23
p13-->p21
p21-->p23
p23-->p33
p23-->p31
p31-->p33
p33-->p43
p33-->p41
p41-->p43
p43-->p53
p43-->p51
p51-->p53
p53-->p60