Scgpt leiden

Run scGPT integration (cell embedding generation) followed by neighbour calculations, leiden clustering and run umap on the result.

Info

ID: scgpt_leiden
Namespace: workflows/integration

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -main-script target/nextflow/workflows/integration/scgpt_leiden/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "input.h5mu"
modality: "rna"
# input_layer: "foo"
# var_gene_names: "foo"
# obs_batch_label: "foo"

# Model
model: # please fill in - example: "resources_test/scgpt/best_model.pt"
model_vocab: # please fill in - example: "resources_test/scgpt/vocab.json"
model_config: # please fill in - example: "args.json"
# finetuned_checkpoints_key: "model_state_dict"

# Outputs
# output: "$id.$key.output.h5mu"
obsm_integrated: "X_scgpt"

# Padding arguments
pad_token: "<pad>"
pad_value: -2

# HVG subset arguments
n_hvg: 1200
hvg_flavor: "cell_ranger"

# Tokenization arguments
# max_seq_len: 123

# Embedding arguments
dsbn: true
batch_size: 64

# Binning arguments
n_input_bins: 51
# seed: 123

# Clustering arguments
leiden_resolution: [1.0]

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments
nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -profile docker \
  -main-script target/nextflow/workflows/integration/scgpt_leiden/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--id ID of the sample. string, required, example: "foo"
--input Path to the input file. file, required, example: "input.h5mu"
--modality string, default: "rna"
--input_layer The layer of the input dataset to process if .X is not to be used. Should contain log normalized counts. string
--var_gene_names The name of the adata var column containing gene names; when no gene_name_layer is provided, the var index will be used. string
--obs_batch_label The name of the adata obs column containing the batch labels. string

Model

Name Description Attributes
--model Path to scGPT model file. file, required, example: "resources_test/scgpt/best_model.pt"
--model_vocab Path to scGPT model vocabulary file. file, required, example: "resources_test/scgpt/vocab.json"
--model_config Path to scGPT model config file. file, required, example: "args.json"
--finetuned_checkpoints_key Key in the model file containing the pretrained checkpoints. Only relevant for fine-tuned models. string, example: "model_state_dict"

Outputs

Name Description Attributes
--output Output file path file, required, example: "output.h5mu"
--obsm_integrated In which .obsm slot to store the resulting integrated embedding. string, default: "X_scgpt"

Padding arguments

Name Description Attributes
--pad_token Token used for padding. string, default: "<pad>"
--pad_value The value of the padding token. integer, default: -2

HVG subset arguments

Name Description Attributes
--n_hvg Number of highly variable genes to subset for. integer, default: 1200
--hvg_flavor Method to be used for identifying highly variable genes. Note that the default for this workflow (cell_ranger) is not the default method for scanpy hvg detection (seurat). string, default: "cell_ranger"

Tokenization arguments

Name Description Attributes
--max_seq_len The maximum sequence length of the tokenized data. Defaults to the number of features if not provided. integer

Embedding arguments

Name Description Attributes
--dsbn Apply domain-specific batch normalization boolean, default: TRUE
--batch_size The batch size to be used for embedding inference. integer, default: 64

Binning arguments

Name Description Attributes
--n_input_bins The number of bins to discretize the data into; When no value is provided, data won’t be binned. integer, default: 51
--seed Seed for random number generation used for binning. If not set, no seed is used. integer

Clustering arguments

Name Description Attributes
--leiden_resolution Control the coarseness of the clustering. Higher values lead to more clusters. List of double, default: 1, multiple_sep: ";"

Authors

  • Dorien Roosen (maintainer, author)

  • Elizabeth Mlynarski (author)

  • Weiwei Schultz (contributor)

Visualisation

flowchart TB
    v0(Channel.fromList)
    v2(filter)
    v10(filter)
    v18(highly_variable_features_scanpy)
    v25(cross)
    v35(cross)
    v41(filter)
    v49(cross_check_genes)
    v56(cross)
    v66(cross)
    v72(filter)
    v80(binning)
    v87(cross)
    v97(cross)
    v103(filter)
    v111(pad_tokenize)
    v118(cross)
    v128(cross)
    v134(filter)
    v142(embedding)
    v149(cross)
    v159(cross)
    v165(filter)
    v319(concat)
    v174(filter)
    v189(cross)
    v199(cross)
    v208(branch)
    v235(concat)
    v220(cross)
    v230(cross)
    v239(branch)
    v266(concat)
    v251(cross)
    v261(cross)
    v270(branch)
    v297(concat)
    v282(cross)
    v292(cross)
    v304(cross)
    v314(cross)
    v326(cross)
    v333(cross)
    v345(cross)
    v352(cross)
    v356(Output)
    subgraph group_neighbors_leiden_umap [neighbors_leiden_umap]
        v182(find_neighbors)
        v213(leiden)
        v244(move_obsm_to_obs)
        v275(umap)
    end
    v208-->v235
    v239-->v266
    v270-->v297
    v0-->v2
    v2-->v10
    v10-->v18
    v18-->v25
    v10-->v25
    v10-->v35
    v41-->v49
    v49-->v56
    v41-->v56
    v41-->v66
    v72-->v80
    v80-->v87
    v72-->v87
    v72-->v97
    v103-->v111
    v111-->v118
    v103-->v118
    v103-->v128
    v134-->v142
    v142-->v149
    v134-->v149
    v134-->v159
    v165-->v174
    v174-->v182
    v182-->v189
    v174-->v189
    v174-->v199
    v208-->v213
    v213-->v220
    v208-->v220
    v208-->v230
    v230-->v235
    v239-->v244
    v244-->v251
    v239-->v251
    v239-->v261
    v261-->v266
    v270-->v275
    v275-->v282
    v270-->v282
    v270-->v292
    v292-->v297
    v297-->v304
    v165-->v304
    v165-->v314
    v314-->v319
    v319-->v326
    v2-->v326
    v326-->v333
    v2-->v333
    v2-->v345
    v345-->v352
    v2-->v352
    v352-->v356
    v35-->v41
    v18-->v35
    v66-->v72
    v49-->v66
    v97-->v103
    v80-->v97
    v128-->v134
    v111-->v128
    v159-->v165
    v142-->v159
    v182-->v199
    v199-->v208
    v213-->v230
    v235-->v239
    v244-->v261
    v266-->v270
    v275-->v292
    v297-->v314
    v319-->v345
    style group_neighbors_leiden_umap fill:#F0F0F0,stroke:#969696;
    style v0 fill:#e3dcea,stroke:#7a4baa;
    style v2 fill:#e3dcea,stroke:#7a4baa;
    style v10 fill:#e3dcea,stroke:#7a4baa;
    style v18 fill:#e3dcea,stroke:#7a4baa;
    style v25 fill:#e3dcea,stroke:#7a4baa;
    style v35 fill:#e3dcea,stroke:#7a4baa;
    style v41 fill:#e3dcea,stroke:#7a4baa;
    style v49 fill:#e3dcea,stroke:#7a4baa;
    style v56 fill:#e3dcea,stroke:#7a4baa;
    style v66 fill:#e3dcea,stroke:#7a4baa;
    style v72 fill:#e3dcea,stroke:#7a4baa;
    style v80 fill:#e3dcea,stroke:#7a4baa;
    style v87 fill:#e3dcea,stroke:#7a4baa;
    style v97 fill:#e3dcea,stroke:#7a4baa;
    style v103 fill:#e3dcea,stroke:#7a4baa;
    style v111 fill:#e3dcea,stroke:#7a4baa;
    style v118 fill:#e3dcea,stroke:#7a4baa;
    style v128 fill:#e3dcea,stroke:#7a4baa;
    style v134 fill:#e3dcea,stroke:#7a4baa;
    style v142 fill:#e3dcea,stroke:#7a4baa;
    style v149 fill:#e3dcea,stroke:#7a4baa;
    style v159 fill:#e3dcea,stroke:#7a4baa;
    style v165 fill:#e3dcea,stroke:#7a4baa;
    style v319 fill:#e3dcea,stroke:#7a4baa;
    style v174 fill:#e3dcea,stroke:#7a4baa;
    style v182 fill:#e3dcea,stroke:#7a4baa;
    style v189 fill:#e3dcea,stroke:#7a4baa;
    style v199 fill:#e3dcea,stroke:#7a4baa;
    style v208 fill:#e3dcea,stroke:#7a4baa;
    style v235 fill:#e3dcea,stroke:#7a4baa;
    style v213 fill:#e3dcea,stroke:#7a4baa;
    style v220 fill:#e3dcea,stroke:#7a4baa;
    style v230 fill:#e3dcea,stroke:#7a4baa;
    style v239 fill:#e3dcea,stroke:#7a4baa;
    style v266 fill:#e3dcea,stroke:#7a4baa;
    style v244 fill:#e3dcea,stroke:#7a4baa;
    style v251 fill:#e3dcea,stroke:#7a4baa;
    style v261 fill:#e3dcea,stroke:#7a4baa;
    style v270 fill:#e3dcea,stroke:#7a4baa;
    style v297 fill:#e3dcea,stroke:#7a4baa;
    style v275 fill:#e3dcea,stroke:#7a4baa;
    style v282 fill:#e3dcea,stroke:#7a4baa;
    style v292 fill:#e3dcea,stroke:#7a4baa;
    style v304 fill:#e3dcea,stroke:#7a4baa;
    style v314 fill:#e3dcea,stroke:#7a4baa;
    style v326 fill:#e3dcea,stroke:#7a4baa;
    style v333 fill:#e3dcea,stroke:#7a4baa;
    style v345 fill:#e3dcea,stroke:#7a4baa;
    style v352 fill:#e3dcea,stroke:#7a4baa;
    style v356 fill:#e3dcea,stroke:#7a4baa;