flowchart TB
v0(Channel.fromList)
v2(filter)
v10(filter)
v18(highly_variable_features_scanpy)
v25(cross)
v35(cross)
v41(filter)
v49(cross_check_genes)
v56(cross)
v66(cross)
v72(filter)
v80(binning)
v87(cross)
v97(cross)
v103(filter)
v111(pad_tokenize)
v118(cross)
v128(cross)
v134(filter)
v142(embedding)
v149(cross)
v159(cross)
v165(filter)
v319(concat)
v174(filter)
v189(cross)
v199(cross)
v208(branch)
v235(concat)
v220(cross)
v230(cross)
v239(branch)
v266(concat)
v251(cross)
v261(cross)
v270(branch)
v297(concat)
v282(cross)
v292(cross)
v304(cross)
v314(cross)
v326(cross)
v333(cross)
v345(cross)
v352(cross)
v356(Output)
subgraph group_neighbors_leiden_umap [neighbors_leiden_umap]
v182(find_neighbors)
v213(leiden)
v244(move_obsm_to_obs)
v275(umap)
end
v208-->v235
v239-->v266
v270-->v297
v0-->v2
v2-->v10
v10-->v18
v18-->v25
v10-->v25
v10-->v35
v41-->v49
v49-->v56
v41-->v56
v41-->v66
v72-->v80
v80-->v87
v72-->v87
v72-->v97
v103-->v111
v111-->v118
v103-->v118
v103-->v128
v134-->v142
v142-->v149
v134-->v149
v134-->v159
v165-->v174
v174-->v182
v182-->v189
v174-->v189
v174-->v199
v208-->v213
v213-->v220
v208-->v220
v208-->v230
v230-->v235
v239-->v244
v244-->v251
v239-->v251
v239-->v261
v261-->v266
v270-->v275
v275-->v282
v270-->v282
v270-->v292
v292-->v297
v297-->v304
v165-->v304
v165-->v314
v314-->v319
v319-->v326
v2-->v326
v326-->v333
v2-->v333
v2-->v345
v345-->v352
v2-->v352
v352-->v356
v35-->v41
v18-->v35
v66-->v72
v49-->v66
v97-->v103
v80-->v97
v128-->v134
v111-->v128
v159-->v165
v142-->v159
v182-->v199
v199-->v208
v213-->v230
v235-->v239
v244-->v261
v266-->v270
v275-->v292
v297-->v314
v319-->v345
style group_neighbors_leiden_umap fill:#F0F0F0,stroke:#969696;
style v0 fill:#e3dcea,stroke:#7a4baa;
style v2 fill:#e3dcea,stroke:#7a4baa;
style v10 fill:#e3dcea,stroke:#7a4baa;
style v18 fill:#e3dcea,stroke:#7a4baa;
style v25 fill:#e3dcea,stroke:#7a4baa;
style v35 fill:#e3dcea,stroke:#7a4baa;
style v41 fill:#e3dcea,stroke:#7a4baa;
style v49 fill:#e3dcea,stroke:#7a4baa;
style v56 fill:#e3dcea,stroke:#7a4baa;
style v66 fill:#e3dcea,stroke:#7a4baa;
style v72 fill:#e3dcea,stroke:#7a4baa;
style v80 fill:#e3dcea,stroke:#7a4baa;
style v87 fill:#e3dcea,stroke:#7a4baa;
style v97 fill:#e3dcea,stroke:#7a4baa;
style v103 fill:#e3dcea,stroke:#7a4baa;
style v111 fill:#e3dcea,stroke:#7a4baa;
style v118 fill:#e3dcea,stroke:#7a4baa;
style v128 fill:#e3dcea,stroke:#7a4baa;
style v134 fill:#e3dcea,stroke:#7a4baa;
style v142 fill:#e3dcea,stroke:#7a4baa;
style v149 fill:#e3dcea,stroke:#7a4baa;
style v159 fill:#e3dcea,stroke:#7a4baa;
style v165 fill:#e3dcea,stroke:#7a4baa;
style v319 fill:#e3dcea,stroke:#7a4baa;
style v174 fill:#e3dcea,stroke:#7a4baa;
style v182 fill:#e3dcea,stroke:#7a4baa;
style v189 fill:#e3dcea,stroke:#7a4baa;
style v199 fill:#e3dcea,stroke:#7a4baa;
style v208 fill:#e3dcea,stroke:#7a4baa;
style v235 fill:#e3dcea,stroke:#7a4baa;
style v213 fill:#e3dcea,stroke:#7a4baa;
style v220 fill:#e3dcea,stroke:#7a4baa;
style v230 fill:#e3dcea,stroke:#7a4baa;
style v239 fill:#e3dcea,stroke:#7a4baa;
style v266 fill:#e3dcea,stroke:#7a4baa;
style v244 fill:#e3dcea,stroke:#7a4baa;
style v251 fill:#e3dcea,stroke:#7a4baa;
style v261 fill:#e3dcea,stroke:#7a4baa;
style v270 fill:#e3dcea,stroke:#7a4baa;
style v297 fill:#e3dcea,stroke:#7a4baa;
style v275 fill:#e3dcea,stroke:#7a4baa;
style v282 fill:#e3dcea,stroke:#7a4baa;
style v292 fill:#e3dcea,stroke:#7a4baa;
style v304 fill:#e3dcea,stroke:#7a4baa;
style v314 fill:#e3dcea,stroke:#7a4baa;
style v326 fill:#e3dcea,stroke:#7a4baa;
style v333 fill:#e3dcea,stroke:#7a4baa;
style v345 fill:#e3dcea,stroke:#7a4baa;
style v352 fill:#e3dcea,stroke:#7a4baa;
style v356 fill:#e3dcea,stroke:#7a4baa;
Scgpt leiden
Run scGPT integration (cell embedding generation) followed by neighbour calculations, leiden clustering and run umap on the result.
Info
ID: scgpt_leiden
Namespace: workflows/integration
Links
Example commands
You can run the pipeline using nextflow run.
View help
You can use --help as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 2.1.1 -latest \
-main-script target/nextflow/workflows/integration/scgpt_leiden/main.nf \
--helpRun command
Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
input: # please fill in - example: "input.h5mu"
modality: "rna"
# input_layer: "foo"
# var_gene_names: "foo"
# obs_batch_label: "foo"
# Model
model: # please fill in - example: "resources_test/scgpt/best_model.pt"
model_vocab: # please fill in - example: "resources_test/scgpt/vocab.json"
model_config: # please fill in - example: "args.json"
# finetuned_checkpoints_key: "model_state_dict"
# Outputs
# output: "$id.$key.output.h5mu"
obsm_integrated: "X_scgpt"
# Padding arguments
pad_token: "<pad>"
pad_value: -2
# HVG subset arguments
n_hvg: 1200
hvg_flavor: "cell_ranger"
# Tokenization arguments
# max_seq_len: 123
# Embedding arguments
dsbn: true
batch_size: 64
# Binning arguments
n_input_bins: 51
# seed: 123
# Clustering arguments
leiden_resolution: [1.0]
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
# Argumentsnextflow run openpipelines-bio/openpipeline \
-r 2.1.1 -latest \
-profile docker \
-main-script target/nextflow/workflows/integration/scgpt_leiden/main.nf \
-params-file params.yaml
Note
Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.
Argument groups
Inputs
| Name | Description | Attributes |
|---|---|---|
--id |
ID of the sample. | string, required, example: "foo" |
--input |
Path to the input file. | file, required, example: "input.h5mu" |
--modality |
string, default: "rna" |
|
--input_layer |
The layer of the input dataset to process if .X is not to be used. Should contain log normalized counts. | string |
--var_gene_names |
The name of the adata var column containing gene names; when no gene_name_layer is provided, the var index will be used. | string |
--obs_batch_label |
The name of the adata obs column containing the batch labels. | string |
Model
| Name | Description | Attributes |
|---|---|---|
--model |
Path to scGPT model file. | file, required, example: "resources_test/scgpt/best_model.pt" |
--model_vocab |
Path to scGPT model vocabulary file. | file, required, example: "resources_test/scgpt/vocab.json" |
--model_config |
Path to scGPT model config file. | file, required, example: "args.json" |
--finetuned_checkpoints_key |
Key in the model file containing the pretrained checkpoints. Only relevant for fine-tuned models. | string, example: "model_state_dict" |
Outputs
| Name | Description | Attributes |
|---|---|---|
--output |
Output file path | file, required, example: "output.h5mu" |
--obsm_integrated |
In which .obsm slot to store the resulting integrated embedding. | string, default: "X_scgpt" |
Padding arguments
| Name | Description | Attributes |
|---|---|---|
--pad_token |
Token used for padding. | string, default: "<pad>" |
--pad_value |
The value of the padding token. | integer, default: -2 |
HVG subset arguments
| Name | Description | Attributes |
|---|---|---|
--n_hvg |
Number of highly variable genes to subset for. | integer, default: 1200 |
--hvg_flavor |
Method to be used for identifying highly variable genes. Note that the default for this workflow (cell_ranger) is not the default method for scanpy hvg detection (seurat). |
string, default: "cell_ranger" |
Tokenization arguments
| Name | Description | Attributes |
|---|---|---|
--max_seq_len |
The maximum sequence length of the tokenized data. Defaults to the number of features if not provided. | integer |
Embedding arguments
| Name | Description | Attributes |
|---|---|---|
--dsbn |
Apply domain-specific batch normalization | boolean, default: TRUE |
--batch_size |
The batch size to be used for embedding inference. | integer, default: 64 |
Binning arguments
| Name | Description | Attributes |
|---|---|---|
--n_input_bins |
The number of bins to discretize the data into; When no value is provided, data won’t be binned. | integer, default: 51 |
--seed |
Seed for random number generation used for binning. If not set, no seed is used. | integer |
Clustering arguments
| Name | Description | Attributes |
|---|---|---|
--leiden_resolution |
Control the coarseness of the clustering. Higher values lead to more clusters. | List of double, default: 1, multiple_sep: ";" |