Score genes cell cycle scanpy

Calculates the score associated to S phase and G2M phase and annotates the cell cycle phase for each cell, as implemented by scanpy.

Info

ID: score_genes_cell_cycle_scanpy
Namespace: feature_annotation

The score is the average expression of a set of genes subtracted with the average expression of a reference set of genes

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -main-script target/nextflow/feature_annotation/score_genes_cell_cycle_scanpy/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
input: # please fill in - example: "input_file.h5mu"
modality: "rna"
# input_layer: "log_normalized"
# var_gene_names: "gene_names"

# Gene list inputs
# s_genes: ["gene1", "gene2", "gene3"]
# s_genes_file: "s_gene_list.txt"
# g2m_genes: ["gene1", "gene2", "gene3"]
# g2m_genes_file: "g2m_gene_list.txt"
# gene_pool: ["gene1", "gene2", "gene3"]
# gene_pool_file: "gene_pool.txt"

# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
obs_phase: "phase"
obs_s_score: "S_score"
obs_g2m_score: "G2M_score"

# Arguments
n_bins: 25
random_state: 0
allow_missing_genes: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 2.1.0 -latest \
  -profile docker \
  -main-script target/nextflow/feature_annotation/score_genes_cell_cycle_scanpy/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--input Input h5mu file file, required, example: "input_file.h5mu"
--modality string, default: "rna"
--input_layer The layer of the adata object containing normalized expression values. If not provided, the X attribute of the adata object will be used. string, example: "log_normalized"
--var_gene_names The name of the column in the var attribute of the adata object that contains the gene names (symbols). If not provided, the index of the var attribute will be used. string, example: "gene_names"

Gene list inputs

The gene list inputs can be provided as a list of gene symbols or as a file containing a list of gene symbols. The gene list file should be formatted as a single column with gene symbols.

Make sure that the gene list inputs are consistent with the gene names in the adata object as provided by the –var_gene_names argument.

Name Description Attributes
--s_genes List of gene symbols for scoring s phase genes. List of string, example: "gene1", "gene2", "gene3", multiple_sep: ";"
--s_genes_file Path to a .txt file containing the gene list of s phase genes to be scored. The gene list file should be formatted as a single column with gene symbols. file, example: "s_gene_list.txt"
--g2m_genes List of gene symbols for scoring g2m phase genes. List of string, example: "gene1", "gene2", "gene3", multiple_sep: ";"
--g2m_genes_file Path to a .txt file containing the gene list of g2m phase genes to be scored. The gene list file should be formatted as a single column with gene symbols. file, example: "g2m_gene_list.txt"
--gene_pool List of gene symbols for sampling the reference set. Default is all genes. List of string, example: "gene1", "gene2", "gene3", multiple_sep: ";"
--gene_pool_file File with genes for sampling the reference set. Default is all genes. The gene pool file should be formatted as a single column with gene symbols. file, example: "gene_pool.txt"

Outputs

Name Description Attributes
--output Output h5mu file file, required, example: "output_file.h5mu"
--output_compression The compression format to be used on the output h5mu object. string, example: "gzip"
--obs_phase The name of the column in the obs attribute of the adata object that will store the cell cycle phase annotation. string, default: "phase"
--obs_s_score The name of the column in the obs attribute of the adata object that will store the s phase score. string, default: "S_score"
--obs_g2m_score The name of the column in the obs attribute of the adata object that will store the g2m phase score. string, default: "G2M_score"

Arguments

Name Description Attributes
--n_bins Number of expression level bins for sampling. integer, default: 25
--random_state The random seed for sampling. integer, default: 0
--allow_missing_genes If true, missing genes in the gene list will be ignored. boolean, default: FALSE

Authors

  • Robrecht Cannoodt (author)

  • Dorien Roosen (author)

  • Weiwei Schultz (contributor)