Scanorama leiden

Run scanorama integration followed by neighbour calculations, leiden clustering and run umap on the result.

Info

ID: scanorama_leiden
Namespace: workflows/integration

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -main-script target/nextflow/workflows/integration/scanorama_leiden/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "dataset.h5mu"
layer: "log_normalized"
modality: "rna"

# Outputs
# output: "$id.$key.output.h5mu"

# Neighbour calculation
uns_neighbors: "scanorama_integration_neighbors"
obsp_neighbor_distances: "scanorama_integration_distances"
obsp_neighbor_connectivities: "scanorama_integration_connectivities"

# Scanorama integration options
obs_batch: "sample_id"
obsm_input: "X_pca"
obsm_output: "X_scanorama"
knn: 20
batch_size: 5000
sigma: 15
approx: true
alpha: 0.1

# Clustering options
obs_cluster: "scanorama_integration_leiden"
leiden_resolution: [1]

# Umap options
obsm_umap: "X_leiden_scanorama_umap"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -profile docker \
  -main-script target/nextflow/workflows/integration/scanorama_leiden/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--id ID of the sample. string, required, example: "foo"
--input Path to the sample. file, required, example: "dataset.h5mu"
--layer use specified layer for expression values instead of the .X object from the modality. string, default: "log_normalized"
--modality Which modality to process. string, default: "rna"

Outputs

Name Description Attributes
--output Destination path to the output. file, required, example: "output.h5mu"

Neighbour calculation

Name Description Attributes
--uns_neighbors In which .uns slot to store various neighbor output objects. string, default: "scanorama_integration_neighbors"
--obsp_neighbor_distances In which .obsp slot to store the distance matrix between the resulting neighbors. string, default: "scanorama_integration_distances"
--obsp_neighbor_connectivities In which .obsp slot to store the connectivities matrix between the resulting neighbors. string, default: "scanorama_integration_connectivities"

Scanorama integration options

Name Description Attributes
--obs_batch Column name discriminating between your batches. string, default: "sample_id"
--obsm_input .obsm slot that points to embedding to run scanorama on. string, default: "X_pca"
--obsm_output The name of the field in adata.obsm where the integrated embeddings will be stored after running this function. Defaults to X_scanorama. string, default: "X_scanorama"
--knn Number of nearest neighbors to use for matching. integer, default: 20
--batch_size The batch size used in the alignment vector computation. Useful when integrating very large (>100k samples) datasets. Set to large value that runs within available memory. integer, default: 5000
--sigma Correction smoothing parameter on Gaussian kernel. double, default: 15
--approx Use approximate nearest neighbors with Python annoy; greatly speeds up matching runtime. boolean, default: TRUE
--alpha Alignment score minimum cutoff double, default: 0.1

Clustering options

Name Description Attributes
--obs_cluster Prefix for the .obs keys under which to add the cluster labels. Newly created columns in .obs will be created from the specified value for ‘–obs_cluster’ suffixed with an underscore and one of the resolutions specified in ‘–leiden_resolution’. string, default: "scanorama_integration_leiden"
--leiden_resolution Control the coarseness of the clustering. Higher values lead to more clusters. List of double, default: 1, multiple_sep: ";"

Umap options

Name Description Attributes
--obsm_umap In which .obsm slot to store the resulting UMAP embedding. string, default: "X_leiden_scanorama_umap"

Authors

  • Mauro Saporita (author)

  • Povilas Gibas (author)

Visualisation

flowchart TB
    v0(Channel.fromList)
    v18(scanorama)
    v49(concat)
    v38(find_neighbors)
    v59(leiden)
    v79(move_obsm_to_obs)
    v92(mix)
    v101(umap)
    v128(Output)
    v0-->v18
    v18-->v38
    v38-->v49
    v49-->v59
    v59-->v79
    v79-->v92
    v49-->v92
    v92-->v101
    v101-->v128
    style v0 fill:#e3dcea,stroke:#7a4baa;
    style v18 fill:#e3dcea,stroke:#7a4baa;
    style v49 fill:#e3dcea,stroke:#7a4baa;
    style v38 fill:#e3dcea,stroke:#7a4baa;
    style v59 fill:#e3dcea,stroke:#7a4baa;
    style v79 fill:#e3dcea,stroke:#7a4baa;
    style v92 fill:#e3dcea,stroke:#7a4baa;
    style v101 fill:#e3dcea,stroke:#7a4baa;
    style v128 fill:#e3dcea,stroke:#7a4baa;