Initialize integration

Run calculations that output information required for most integration methods: PCA, nearest neighbour and UMAP.

Info

ID: initialize_integration
Namespace: multiomics/integration

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.12.6 -latest \
  -main-script ./workflows/multiomics/integration/initialize_integration/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "dataset.h5mu"
layer: "log_normalized"
modality: "rna"

# Outputs
# output: "$id.$key.output.h5mu"

# PCA options
obsm_pca: "X_pca"
# var_pca_feature_selection: "foo"
pca_overwrite: false

# Neighbour calculation
uns_neighbors: "neighbors"
obsp_neighbor_distances: "distances"
obsp_neighbor_connectivities: "connectivities"

# Umap options
obsm_umap: "X_umap"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 0.12.6 -latest \
  -profile docker \
  -main-script ./workflows/multiomics/integration/initialize_integration/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--id ID of the sample. string, required, example: "foo"
--input Path to the sample. file, required, example: "dataset.h5mu"
--layer use specified layer for expression values instead of the .X object from the modality. string, default: "log_normalized"
--modality Which modality to process. string, default: "rna"

Outputs

Name Description Attributes
--output Destination path to the output. file, required, example: "output.h5mu"

PCA options

Name Description Attributes
--obsm_pca In which .obsm slot to store the resulting PCA embedding. string, default: "X_pca"
--var_pca_feature_selection Column name in .var matrix that will be used to select which genes to run the PCA on. string
--pca_overwrite Allow overwriting slots for PCA output. boolean_true

Neighbour calculation

Name Description Attributes
--uns_neighbors In which .uns slot to store various neighbor output objects. string, default: "neighbors"
--obsp_neighbor_distances In which .obsp slot to store the distance matrix between the resulting neighbors. string, default: "distances"
--obsp_neighbor_connectivities In which .obsp slot to store the connectivities matrix between the resulting neighbors. string, default: "connectivities"

Umap options

Name Description Attributes
--obsm_umap In which .obsm slot to store the resulting UMAP embedding. string, default: "X_umap"

Authors

  • Dries Schaumont (author)

Visualisation

flowchart LR
    v0(Input)
    v2(toSortedList)
    v4(flatMap)
    v11(pca)
    v13(join)
    v22(find_neighbors)
    v24(join)
    v33(umap)
    v35(join)
    v40(toSortedList)
    v42(Output)
    v0-->v2
    v2-->v4
    v4-->v13
    v4-->v11
    v11-->v13
    v13-->v24
    v13-->v22
    v22-->v24
    v24-->v35
    v24-->v33
    v33-->v35
    v35-->v40
    v40-->v42