flowchart TB
v0(Channel.fromList)
v2(filter)
v10(filter)
v18(normalize_total)
v25(cross)
v35(cross)
v41(filter)
v49(log1p)
v56(cross)
v66(cross)
v72(filter)
v80(delete_layer)
v87(cross)
v97(cross)
v106(branch)
v133(concat)
v111(scale)
v118(cross)
v128(cross)
v134(filter)
v142(highly_variable_features_scanpy)
v149(cross)
v159(cross)
v165(filter)
v288(concat)
v177(branch)
v204(concat)
v189(cross)
v199(cross)
v208(branch)
v235(concat)
v220(cross)
v230(cross)
v236(filter)
v266(concat)
v251(cross)
v261(cross)
v273(cross)
v283(cross)
v295(cross)
v302(cross)
v314(cross)
v321(cross)
v325(Output)
subgraph group_rna_qc [rna_qc]
v182(grep_mitochondrial_genes)
v213(grep_ribosomal_genes)
v244(calculate_qc_metrics)
end
v106-->v133
v133-->v134
v177-->v204
v208-->v235
v235-->v236
v0-->v2
v2-->v10
v10-->v18
v18-->v25
v10-->v25
v10-->v35
v41-->v49
v49-->v56
v41-->v56
v41-->v66
v72-->v80
v80-->v87
v72-->v87
v72-->v97
v106-->v111
v111-->v118
v106-->v118
v106-->v128
v128-->v133
v134-->v142
v142-->v149
v134-->v149
v134-->v159
v177-->v182
v182-->v189
v177-->v189
v177-->v199
v199-->v204
v208-->v213
v213-->v220
v208-->v220
v208-->v230
v230-->v235
v236-->v244
v244-->v251
v236-->v251
v236-->v261
v261-->v266
v266-->v273
v165-->v273
v165-->v283
v283-->v288
v288-->v295
v2-->v295
v295-->v302
v2-->v302
v2-->v314
v314-->v321
v2-->v321
v321-->v325
v35-->v41
v18-->v35
v66-->v72
v49-->v66
v80-->v97
v97-->v106
v111-->v128
v159-->v165
v142-->v159
v165-->v177
v182-->v199
v204-->v208
v213-->v230
v244-->v261
v266-->v283
v288-->v314
style group_rna_qc fill:#F0F0F0,stroke:#969696;
style v0 fill:#e3dcea,stroke:#7a4baa;
style v2 fill:#e3dcea,stroke:#7a4baa;
style v10 fill:#e3dcea,stroke:#7a4baa;
style v18 fill:#e3dcea,stroke:#7a4baa;
style v25 fill:#e3dcea,stroke:#7a4baa;
style v35 fill:#e3dcea,stroke:#7a4baa;
style v41 fill:#e3dcea,stroke:#7a4baa;
style v49 fill:#e3dcea,stroke:#7a4baa;
style v56 fill:#e3dcea,stroke:#7a4baa;
style v66 fill:#e3dcea,stroke:#7a4baa;
style v72 fill:#e3dcea,stroke:#7a4baa;
style v80 fill:#e3dcea,stroke:#7a4baa;
style v87 fill:#e3dcea,stroke:#7a4baa;
style v97 fill:#e3dcea,stroke:#7a4baa;
style v106 fill:#e3dcea,stroke:#7a4baa;
style v133 fill:#e3dcea,stroke:#7a4baa;
style v111 fill:#e3dcea,stroke:#7a4baa;
style v118 fill:#e3dcea,stroke:#7a4baa;
style v128 fill:#e3dcea,stroke:#7a4baa;
style v134 fill:#e3dcea,stroke:#7a4baa;
style v142 fill:#e3dcea,stroke:#7a4baa;
style v149 fill:#e3dcea,stroke:#7a4baa;
style v159 fill:#e3dcea,stroke:#7a4baa;
style v165 fill:#e3dcea,stroke:#7a4baa;
style v288 fill:#e3dcea,stroke:#7a4baa;
style v177 fill:#e3dcea,stroke:#7a4baa;
style v204 fill:#e3dcea,stroke:#7a4baa;
style v182 fill:#e3dcea,stroke:#7a4baa;
style v189 fill:#e3dcea,stroke:#7a4baa;
style v199 fill:#e3dcea,stroke:#7a4baa;
style v208 fill:#e3dcea,stroke:#7a4baa;
style v235 fill:#e3dcea,stroke:#7a4baa;
style v213 fill:#e3dcea,stroke:#7a4baa;
style v220 fill:#e3dcea,stroke:#7a4baa;
style v230 fill:#e3dcea,stroke:#7a4baa;
style v236 fill:#e3dcea,stroke:#7a4baa;
style v266 fill:#e3dcea,stroke:#7a4baa;
style v244 fill:#e3dcea,stroke:#7a4baa;
style v251 fill:#e3dcea,stroke:#7a4baa;
style v261 fill:#e3dcea,stroke:#7a4baa;
style v273 fill:#e3dcea,stroke:#7a4baa;
style v283 fill:#e3dcea,stroke:#7a4baa;
style v295 fill:#e3dcea,stroke:#7a4baa;
style v302 fill:#e3dcea,stroke:#7a4baa;
style v314 fill:#e3dcea,stroke:#7a4baa;
style v321 fill:#e3dcea,stroke:#7a4baa;
style v325 fill:#e3dcea,stroke:#7a4baa;
Rna multisample
Processing unimodal multi-sample RNA transcriptomics data.
Info
ID: rna_multisample
Namespace: workflows/rna
Links
Example commands
You can run the pipeline using nextflow run.
View help
You can use --help as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 2.1.1 -latest \
-main-script target/nextflow/workflows/rna/rna_multisample/main.nf \
--helpRun command
Example of params.yaml
# Inputs
id: # please fill in - example: "concatenated"
input: # please fill in - example: "dataset.h5mu"
modality: "rna"
# layer: "foo"
# Output
# output: "$id.$key.output.h5mu"
# Filtering highly variable features
highly_variable_features_var_output: "filter_with_hvg"
highly_variable_features_obs_batch_key: "sample_id"
highly_variable_features_flavor: "seurat"
# highly_variable_features_n_top_features: 123
# QC metrics calculation options
var_qc_metrics: ["filter_with_hvg"]
top_n_vars: [50, 100, 200, 500]
output_obs_num_nonzero_vars: "num_nonzero_vars"
output_obs_total_counts_vars: "total_counts"
output_var_num_nonzero_obs: "num_nonzero_obs"
output_var_total_counts_obs: "total_counts"
output_var_obs_mean: "obs_mean"
output_var_pct_dropout: "pct_dropout"
# RNA Scaling options
enable_scaling: false
scaling_output_layer: "scaled"
# scaling_max_value: 123.0
scaling_zero_center: true
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
# Argumentsnextflow run openpipelines-bio/openpipeline \
-r 2.1.1 -latest \
-profile docker \
-main-script target/nextflow/workflows/rna/rna_multisample/main.nf \
-params-file params.yaml
Note
Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.
Argument groups
Inputs
| Name | Description | Attributes |
|---|---|---|
--id |
ID of the concatenated file | string, required, example: "concatenated" |
--input |
Path to the samples. | file, required, example: "dataset.h5mu" |
--modality |
Modality to process. | string, default: "rna" |
--layer |
Input layer to use. If not specified, .X is used. | string |
Output
| Name | Description | Attributes |
|---|---|---|
--output |
Destination path to the output. | file, required, example: "output.h5mu" |
Filtering highly variable features
| Name | Description | Attributes |
|---|---|---|
--highly_variable_features_var_output |
In which .var slot to store a boolean array corresponding to the highly variable features. | string, default: "filter_with_hvg" |
--highly_variable_features_obs_batch_key |
If specified, highly-variable features are selected within each batch separately and merged. This simple process avoids the selection of batch-specific features and acts as a lightweight batch correction method. For all flavors, featues are first sorted by how many batches they are highly variable. For dispersion-based flavors ties are broken by normalized dispersion. If flavor = ‘seurat_v3’, ties are broken by the median (across batches) rank based on within-batch normalized variance. | string, default: "sample_id" |
--highly_variable_features_flavor |
Choose the flavor for identifying highly variable features. For the dispersion based methods in their default workflows, Seurat passes the cutoffs whereas Cell Ranger passes n_top_features. | string, default: "seurat" |
--highly_variable_features_n_top_features |
Number of highly-variable features to keep. Mandatory if filter_with_hvg_flavor is set to ‘seurat_v3’. | integer |
QC metrics calculation options
| Name | Description | Attributes |
|---|---|---|
--var_qc_metrics |
Keys to select a boolean (containing only True or False) column from .var. For each cell, calculate the proportion of total values for genes which are labeled ‘True’, compared to the total sum of the values for all genes. | List of string, default: "filter_with_hvg", example: "ercc,highly_variable", multiple_sep: "," |
--top_n_vars |
Number of top vars to be used to calculate cumulative proportions. If not specified, proportions are not calculated. --top_n_vars 20,50 finds cumulative proportion to the 20th and 50th most expressed vars. |
List of integer, default: 50, 100, 200, 500, multiple_sep: "," |
--output_obs_num_nonzero_vars |
Name of column in .obs describing, for each observation, the number of stored values (including explicit zeroes). In other words, the name of the column that counts for each row the number of columns that contain data. | string, default: "num_nonzero_vars" |
--output_obs_total_counts_vars |
Name of the column for .obs describing, for each observation (row), the sum of the stored values in the columns. | string, default: "total_counts" |
--output_var_num_nonzero_obs |
Name of column describing, for each feature, the number of stored values (including explicit zeroes). In other words, the name of the column that counts for each column the number of rows that contain data. | string, default: "num_nonzero_obs" |
--output_var_total_counts_obs |
Name of the column in .var describing, for each feature (column), the sum of the stored values in the rows. | string, default: "total_counts" |
--output_var_obs_mean |
Name of the column in .obs providing the mean of the values in each row. | string, default: "obs_mean" |
--output_var_pct_dropout |
Name of the column in .obs providing for each feature the percentage of observations the feature does not appear on (i.e. is missing). Same as --num_nonzero_obs but percentage based. |
string, default: "pct_dropout" |
RNA Scaling options
Options for enabling scaling of the log-normalized data to unit variance and zero mean. The scaled data will be output a different layer and representation with reduced dimensions will be created and stored in addition to the non-scaled data.
| Name | Description | Attributes |
|---|---|---|
--enable_scaling |
Enable scaling for the RNA modality. | boolean_true |
--scaling_output_layer |
Output layer where the scaled log-normalized data will be stored. | string, default: "scaled" |
--scaling_max_value |
Clip (truncate) data to this value after scaling. If not specified, do not clip. | double |
--scaling_zero_center |
If set, omit zero-centering variables, which allows to handle sparse input efficiently.” | boolean_false |