Scgpt leiden

Run scGPT integration (cell embedding generation) followed by neighbour calculations, leiden clustering and run umap on the result.

Info

ID: scgpt_leiden
Namespace: workflows/integration

Links

Source

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/workflows/integration/scgpt_leiden/main.nf \
  --help

Run command

Example of params.yaml

# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "input.h5mu"
modality: "rna"
# input_layer: "foo"
# var_gene_names: "foo"
# obs_batch_label: "foo"

# Model
model: # please fill in - example: "resources_test/scgpt/best_model.pt"
model_vocab: # please fill in - example: "resources_test/scgpt/vocab.json"
model_config: # please fill in - example: "args.json"
# finetuned_checkpoints_key: "model_state_dict"

# Outputs
# output: "$id.$key.output.h5mu"
obsm_integrated: "X_scgpt"

# Padding arguments
pad_token: "<pad>"
pad_value: -2

# HVG subset arguments
n_hvg: 1200
hvg_flavor: "cell_ranger"

# Tokenization arguments
# max_seq_len: 123

# Embedding arguments
dsbn: true
batch_size: 64

# Binning arguments
n_input_bins: 51
# seed: 123

# Clustering arguments
leiden_resolution: [1.0]

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/workflows/integration/scgpt_leiden/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name	Description	Attributes
`--id`	ID of the sample.	`string`, required, example: `"foo"`
`--input`	Path to the input file.	`file`, required, example: `"input.h5mu"`
`--modality`		`string`, default: `"rna"`
`--input_layer`	The layer of the input dataset to process if .X is not to be used. Should contain log normalized counts.	`string`
`--var_gene_names`	The name of the adata var column containing gene names; when no gene_name_layer is provided, the var index will be used.	`string`
`--obs_batch_label`	The name of the adata obs column containing the batch labels.	`string`

Model

Name	Description	Attributes
`--model`	Path to scGPT model file.	`file`, required, example: `"resources_test/scgpt/best_model.pt"`
`--model_vocab`	Path to scGPT model vocabulary file.	`file`, required, example: `"resources_test/scgpt/vocab.json"`
`--model_config`	Path to scGPT model config file.	`file`, required, example: `"args.json"`
`--finetuned_checkpoints_key`	Key in the model file containing the pretrained checkpoints. Only relevant for fine-tuned models.	`string`, example: `"model_state_dict"`

Outputs

Name	Description	Attributes
`--output`	Output file path	`file`, required, example: `"output.h5mu"`
`--obsm_integrated`	In which .obsm slot to store the resulting integrated embedding.	`string`, default: `"X_scgpt"`

Padding arguments

Name	Description	Attributes
`--pad_token`	Token used for padding.	`string`, default: `"<pad>"`
`--pad_value`	The value of the padding token.	`integer`, default: `-2`

HVG subset arguments

Name	Description	Attributes
`--n_hvg`	Number of highly variable genes to subset for.	`integer`, default: `1200`
`--hvg_flavor`	Method to be used for identifying highly variable genes. Note that the default for this workflow (`cell_ranger`) is not the default method for scanpy hvg detection (`seurat`).	`string`, default: `"cell_ranger"`

Tokenization arguments

Name	Description	Attributes
`--max_seq_len`	The maximum sequence length of the tokenized data. Defaults to the number of features if not provided.	`integer`

Embedding arguments

Name	Description	Attributes
`--dsbn`	Apply domain-specific batch normalization	`boolean`, default: `TRUE`
`--batch_size`	The batch size to be used for embedding inference.	`integer`, default: `64`

Binning arguments

Name	Description	Attributes
`--n_input_bins`	The number of bins to discretize the data into; When no value is provided, data won’t be binned.	`integer`, default: `51`
`--seed`	Seed for random number generation used for binning. If not set, no seed is used.	`integer`

Clustering arguments

Name	Description	Attributes
`--leiden_resolution`	Control the coarseness of the clustering. Higher values lead to more clusters.	List of `double`, default: `1`, multiple_sep: `";"`

Authors

Dorien Roosen (maintainer, author)
Elizabeth Mlynarski (author)
Weiwei Schultz (contributor)

Visualisation

flowchart TB
    v0(Channel.fromList)
    v2(filter)
    v10(filter)
    v18(highly_variable_features_scanpy)
    v25(cross)
    v35(cross)
    v41(filter)
    v49(cross_check_genes)
    v56(cross)
    v66(cross)
    v72(filter)
    v80(binning)
    v87(cross)
    v97(cross)
    v103(filter)
    v111(pad_tokenize)
    v118(cross)
    v128(cross)
    v134(filter)
    v142(embedding)
    v149(cross)
    v159(cross)
    v165(filter)
    v319(concat)
    v174(filter)
    v189(cross)
    v199(cross)
    v208(branch)
    v235(concat)
    v220(cross)
    v230(cross)
    v239(branch)
    v266(concat)
    v251(cross)
    v261(cross)
    v270(branch)
    v297(concat)
    v282(cross)
    v292(cross)
    v304(cross)
    v314(cross)
    v326(cross)
    v333(cross)
    v345(cross)
    v352(cross)
    v356(Output)
    subgraph group_neighbors_leiden_umap [neighbors_leiden_umap]
        v182(find_neighbors)
        v213(leiden)
        v244(move_obsm_to_obs)
        v275(umap)
    end
    v208-->v235
    v239-->v266
    v270-->v297
    v0-->v2
    v2-->v10
    v10-->v18
    v18-->v25
    v10-->v25
    v10-->v35
    v41-->v49
    v49-->v56
    v41-->v56
    v41-->v66
    v72-->v80
    v80-->v87
    v72-->v87
    v72-->v97
    v103-->v111
    v111-->v118
    v103-->v118
    v103-->v128
    v134-->v142
    v142-->v149
    v134-->v149
    v134-->v159
    v165-->v174
    v174-->v182
    v182-->v189
    v174-->v189
    v174-->v199
    v208-->v213
    v213-->v220
    v208-->v220
    v208-->v230
    v230-->v235
    v239-->v244
    v244-->v251
    v239-->v251
    v239-->v261
    v261-->v266
    v270-->v275
    v275-->v282
    v270-->v282
    v270-->v292
    v292-->v297
    v297-->v304
    v165-->v304
    v165-->v314
    v314-->v319
    v319-->v326
    v2-->v326
    v326-->v333
    v2-->v333
    v2-->v345
    v345-->v352
    v2-->v352
    v352-->v356
    v35-->v41
    v18-->v35
    v66-->v72
    v49-->v66
    v97-->v103
    v80-->v97
    v128-->v134
    v111-->v128
    v159-->v165
    v142-->v159
    v182-->v199
    v199-->v208
    v213-->v230
    v235-->v239
    v244-->v261
    v266-->v270
    v275-->v292
    v297-->v314
    v319-->v345
    style group_neighbors_leiden_umap fill:#F0F0F0,stroke:#969696;
    style v0 fill:#e3dcea,stroke:#7a4baa;
    style v2 fill:#e3dcea,stroke:#7a4baa;
    style v10 fill:#e3dcea,stroke:#7a4baa;
    style v18 fill:#e3dcea,stroke:#7a4baa;
    style v25 fill:#e3dcea,stroke:#7a4baa;
    style v35 fill:#e3dcea,stroke:#7a4baa;
    style v41 fill:#e3dcea,stroke:#7a4baa;
    style v49 fill:#e3dcea,stroke:#7a4baa;
    style v56 fill:#e3dcea,stroke:#7a4baa;
    style v66 fill:#e3dcea,stroke:#7a4baa;
    style v72 fill:#e3dcea,stroke:#7a4baa;
    style v80 fill:#e3dcea,stroke:#7a4baa;
    style v87 fill:#e3dcea,stroke:#7a4baa;
    style v97 fill:#e3dcea,stroke:#7a4baa;
    style v103 fill:#e3dcea,stroke:#7a4baa;
    style v111 fill:#e3dcea,stroke:#7a4baa;
    style v118 fill:#e3dcea,stroke:#7a4baa;
    style v128 fill:#e3dcea,stroke:#7a4baa;
    style v134 fill:#e3dcea,stroke:#7a4baa;
    style v142 fill:#e3dcea,stroke:#7a4baa;
    style v149 fill:#e3dcea,stroke:#7a4baa;
    style v159 fill:#e3dcea,stroke:#7a4baa;
    style v165 fill:#e3dcea,stroke:#7a4baa;
    style v319 fill:#e3dcea,stroke:#7a4baa;
    style v174 fill:#e3dcea,stroke:#7a4baa;
    style v182 fill:#e3dcea,stroke:#7a4baa;
    style v189 fill:#e3dcea,stroke:#7a4baa;
    style v199 fill:#e3dcea,stroke:#7a4baa;
    style v208 fill:#e3dcea,stroke:#7a4baa;
    style v235 fill:#e3dcea,stroke:#7a4baa;
    style v213 fill:#e3dcea,stroke:#7a4baa;
    style v220 fill:#e3dcea,stroke:#7a4baa;
    style v230 fill:#e3dcea,stroke:#7a4baa;
    style v239 fill:#e3dcea,stroke:#7a4baa;
    style v266 fill:#e3dcea,stroke:#7a4baa;
    style v244 fill:#e3dcea,stroke:#7a4baa;
    style v251 fill:#e3dcea,stroke:#7a4baa;
    style v261 fill:#e3dcea,stroke:#7a4baa;
    style v270 fill:#e3dcea,stroke:#7a4baa;
    style v297 fill:#e3dcea,stroke:#7a4baa;
    style v275 fill:#e3dcea,stroke:#7a4baa;
    style v282 fill:#e3dcea,stroke:#7a4baa;
    style v292 fill:#e3dcea,stroke:#7a4baa;
    style v304 fill:#e3dcea,stroke:#7a4baa;
    style v314 fill:#e3dcea,stroke:#7a4baa;
    style v326 fill:#e3dcea,stroke:#7a4baa;
    style v333 fill:#e3dcea,stroke:#7a4baa;
    style v345 fill:#e3dcea,stroke:#7a4baa;
    style v352 fill:#e3dcea,stroke:#7a4baa;
    style v356 fill:#e3dcea,stroke:#7a4baa;