flowchart TB v0(Channel.fromList) v2(filter) v10(filter) v18(normalize_total) v25(cross) v35(cross) v41(filter) v49(log1p) v56(cross) v66(cross) v72(filter) v80(delete_layer) v87(cross) v97(cross) v106(branch) v133(concat) v111(scale) v118(cross) v128(cross) v134(filter) v142(highly_variable_features_scanpy) v149(cross) v159(cross) v165(filter) v288(concat) v177(branch) v204(concat) v189(cross) v199(cross) v208(branch) v235(concat) v220(cross) v230(cross) v236(filter) v266(concat) v251(cross) v261(cross) v273(cross) v283(cross) v295(cross) v302(cross) v314(cross) v321(cross) v325(Output) subgraph group_rna_qc [rna_qc] v182(grep_mitochondrial_genes) v213(grep_ribosomal_genes) v244(calculate_qc_metrics) end v106-->v133 v133-->v134 v177-->v204 v208-->v235 v235-->v236 v0-->v2 v2-->v10 v10-->v18 v18-->v25 v10-->v25 v10-->v35 v41-->v49 v49-->v56 v41-->v56 v41-->v66 v72-->v80 v80-->v87 v72-->v87 v72-->v97 v106-->v111 v111-->v118 v106-->v118 v106-->v128 v128-->v133 v134-->v142 v142-->v149 v134-->v149 v134-->v159 v177-->v182 v182-->v189 v177-->v189 v177-->v199 v199-->v204 v208-->v213 v213-->v220 v208-->v220 v208-->v230 v230-->v235 v236-->v244 v244-->v251 v236-->v251 v236-->v261 v261-->v266 v266-->v273 v165-->v273 v165-->v283 v283-->v288 v288-->v295 v2-->v295 v295-->v302 v2-->v302 v2-->v314 v314-->v321 v2-->v321 v321-->v325 v35-->v41 v18-->v35 v66-->v72 v49-->v66 v80-->v97 v97-->v106 v111-->v128 v159-->v165 v142-->v159 v165-->v177 v182-->v199 v204-->v208 v213-->v230 v244-->v261 v266-->v283 v288-->v314 style group_rna_qc fill:#F0F0F0,stroke:#969696; style v0 fill:#e3dcea,stroke:#7a4baa; style v2 fill:#e3dcea,stroke:#7a4baa; style v10 fill:#e3dcea,stroke:#7a4baa; style v18 fill:#e3dcea,stroke:#7a4baa; style v25 fill:#e3dcea,stroke:#7a4baa; style v35 fill:#e3dcea,stroke:#7a4baa; style v41 fill:#e3dcea,stroke:#7a4baa; style v49 fill:#e3dcea,stroke:#7a4baa; style v56 fill:#e3dcea,stroke:#7a4baa; style v66 fill:#e3dcea,stroke:#7a4baa; style v72 fill:#e3dcea,stroke:#7a4baa; style v80 fill:#e3dcea,stroke:#7a4baa; style v87 fill:#e3dcea,stroke:#7a4baa; style v97 fill:#e3dcea,stroke:#7a4baa; style v106 fill:#e3dcea,stroke:#7a4baa; style v133 fill:#e3dcea,stroke:#7a4baa; style v111 fill:#e3dcea,stroke:#7a4baa; style v118 fill:#e3dcea,stroke:#7a4baa; style v128 fill:#e3dcea,stroke:#7a4baa; style v134 fill:#e3dcea,stroke:#7a4baa; style v142 fill:#e3dcea,stroke:#7a4baa; style v149 fill:#e3dcea,stroke:#7a4baa; style v159 fill:#e3dcea,stroke:#7a4baa; style v165 fill:#e3dcea,stroke:#7a4baa; style v288 fill:#e3dcea,stroke:#7a4baa; style v177 fill:#e3dcea,stroke:#7a4baa; style v204 fill:#e3dcea,stroke:#7a4baa; style v182 fill:#e3dcea,stroke:#7a4baa; style v189 fill:#e3dcea,stroke:#7a4baa; style v199 fill:#e3dcea,stroke:#7a4baa; style v208 fill:#e3dcea,stroke:#7a4baa; style v235 fill:#e3dcea,stroke:#7a4baa; style v213 fill:#e3dcea,stroke:#7a4baa; style v220 fill:#e3dcea,stroke:#7a4baa; style v230 fill:#e3dcea,stroke:#7a4baa; style v236 fill:#e3dcea,stroke:#7a4baa; style v266 fill:#e3dcea,stroke:#7a4baa; style v244 fill:#e3dcea,stroke:#7a4baa; style v251 fill:#e3dcea,stroke:#7a4baa; style v261 fill:#e3dcea,stroke:#7a4baa; style v273 fill:#e3dcea,stroke:#7a4baa; style v283 fill:#e3dcea,stroke:#7a4baa; style v295 fill:#e3dcea,stroke:#7a4baa; style v302 fill:#e3dcea,stroke:#7a4baa; style v314 fill:#e3dcea,stroke:#7a4baa; style v321 fill:#e3dcea,stroke:#7a4baa; style v325 fill:#e3dcea,stroke:#7a4baa;
Rna multisample
Processing unimodal multi-sample RNA transcriptomics data.
Info
ID: rna_multisample
Namespace: workflows/rna
Links
Example commands
You can run the pipeline using nextflow run
.
View help
You can use --help
as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 2.1.0 -latest \
-main-script target/nextflow/workflows/rna/rna_multisample/main.nf \
--help
Run command
Example of params.yaml
# Inputs
id: # please fill in - example: "concatenated"
input: # please fill in - example: "dataset.h5mu"
modality: "rna"
# layer: "foo"
# Output
# output: "$id.$key.output.h5mu"
# Filtering highly variable features
highly_variable_features_var_output: "filter_with_hvg"
highly_variable_features_obs_batch_key: "sample_id"
highly_variable_features_flavor: "seurat"
# highly_variable_features_n_top_features: 123
# QC metrics calculation options
var_qc_metrics: ["filter_with_hvg"]
top_n_vars: [50, 100, 200, 500]
output_obs_num_nonzero_vars: "num_nonzero_vars"
output_obs_total_counts_vars: "total_counts"
output_var_num_nonzero_obs: "num_nonzero_obs"
output_var_total_counts_obs: "total_counts"
output_var_obs_mean: "obs_mean"
output_var_pct_dropout: "pct_dropout"
# RNA Scaling options
enable_scaling: false
scaling_output_layer: "scaled"
# scaling_max_value: 123.0
scaling_zero_center: true
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
# Arguments
nextflow run openpipelines-bio/openpipeline \
-r 2.1.0 -latest \
-profile docker \
-main-script target/nextflow/workflows/rna/rna_multisample/main.nf \
-params-file params.yaml
Note
Replace -profile docker
with -profile podman
or -profile singularity
depending on the desired backend.
Argument groups
Inputs
Name | Description | Attributes |
---|---|---|
--id |
ID of the concatenated file | string , required, example: "concatenated" |
--input |
Path to the samples. | file , required, example: "dataset.h5mu" |
--modality |
Modality to process. | string , default: "rna" |
--layer |
Input layer to use. If not specified, .X is used. | string |
Output
Name | Description | Attributes |
---|---|---|
--output |
Destination path to the output. | file , required, example: "output.h5mu" |
Filtering highly variable features
Name | Description | Attributes |
---|---|---|
--highly_variable_features_var_output |
In which .var slot to store a boolean array corresponding to the highly variable features. | string , default: "filter_with_hvg" |
--highly_variable_features_obs_batch_key |
If specified, highly-variable features are selected within each batch separately and merged. This simple process avoids the selection of batch-specific features and acts as a lightweight batch correction method. For all flavors, featues are first sorted by how many batches they are highly variable. For dispersion-based flavors ties are broken by normalized dispersion. If flavor = ‘seurat_v3’, ties are broken by the median (across batches) rank based on within-batch normalized variance. | string , default: "sample_id" |
--highly_variable_features_flavor |
Choose the flavor for identifying highly variable features. For the dispersion based methods in their default workflows, Seurat passes the cutoffs whereas Cell Ranger passes n_top_features. | string , default: "seurat" |
--highly_variable_features_n_top_features |
Number of highly-variable features to keep. Mandatory if filter_with_hvg_flavor is set to ‘seurat_v3’. | integer |
QC metrics calculation options
Name | Description | Attributes |
---|---|---|
--var_qc_metrics |
Keys to select a boolean (containing only True or False) column from .var. For each cell, calculate the proportion of total values for genes which are labeled ‘True’, compared to the total sum of the values for all genes. | List of string , default: "filter_with_hvg" , example: "ercc,highly_variable" , multiple_sep: "," |
--top_n_vars |
Number of top vars to be used to calculate cumulative proportions. If not specified, proportions are not calculated. --top_n_vars 20,50 finds cumulative proportion to the 20th and 50th most expressed vars. |
List of integer , default: 50, 100, 200, 500 , multiple_sep: "," |
--output_obs_num_nonzero_vars |
Name of column in .obs describing, for each observation, the number of stored values (including explicit zeroes). In other words, the name of the column that counts for each row the number of columns that contain data. | string , default: "num_nonzero_vars" |
--output_obs_total_counts_vars |
Name of the column for .obs describing, for each observation (row), the sum of the stored values in the columns. | string , default: "total_counts" |
--output_var_num_nonzero_obs |
Name of column describing, for each feature, the number of stored values (including explicit zeroes). In other words, the name of the column that counts for each column the number of rows that contain data. | string , default: "num_nonzero_obs" |
--output_var_total_counts_obs |
Name of the column in .var describing, for each feature (column), the sum of the stored values in the rows. | string , default: "total_counts" |
--output_var_obs_mean |
Name of the column in .obs providing the mean of the values in each row. | string , default: "obs_mean" |
--output_var_pct_dropout |
Name of the column in .obs providing for each feature the percentage of observations the feature does not appear on (i.e. is missing). Same as --num_nonzero_obs but percentage based. |
string , default: "pct_dropout" |
RNA Scaling options
Options for enabling scaling of the log-normalized data to unit variance and zero mean. The scaled data will be output a different layer and representation with reduced dimensions will be created and stored in addition to the non-scaled data.
Name | Description | Attributes |
---|---|---|
--enable_scaling |
Enable scaling for the RNA modality. | boolean_true |
--scaling_output_layer |
Output layer where the scaled log-normalized data will be stored. | string , default: "scaled" |
--scaling_max_value |
Clip (truncate) data to this value after scaling. If not specified, do not clip. | double |
--scaling_zero_center |
If set, omit zero-centering variables, which allows to handle sparse input efficiently.” | boolean_false |