Cellxgene census

Query CellxGene Census or user-specified TileDBSoma object, and eventually fetch cell and gene metadata or/and expression counts.


ID: cellxgene_census
Namespace: query

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.12.0 -latest \
  -main-script target/nextflow/query/cellxgene_census/main.nf \

Run command

Example of params.yaml
# Inputs
input_database: "CellxGene"
modality: "rna"
cellxgene_release: "2023-05-15"

# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"

# Query
species: "homo_sapiens"
# cell_query: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
# cells_filter_columns: ["dataset_id", "tissue", "assay", "disease", "cell_type"]
# min_cells_filter_columns: 100

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 0.12.0 -latest \
  -profile docker \
  -main-script target/nextflow/query/cellxgene_census/main.nf \
  -params-file params.yaml

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups


Arguments related to the input (aka query) dataset.

Name Description Attributes
--input_database Full input database S3 prefix URL. Default: CellxGene Census string, default: "CellxGene", example: "s3://"
--modality Which modality to store the output in. string, default: "rna"
--cellxgene_release CellxGene Census release date. More information: https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html string, default: "2023-05-15"


Arguments related to the query.

Name Description Attributes
--species Specie(s) of interest. If not specified, Homo Sapiens will be queried. string, default: "homo_sapiens", example: "homo_sapiens"
--cell_query The query for selecting the cells as defined by the cellxgene census schema. string, example: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
--cells_filter_columns The query for selecting the cells as defined by the cellxgene census schema. List of string, example: "dataset_id", "tissue", "assay", "disease", "cell_type", multiple_sep: ":"
--min_cells_filter_columns Minimum of amount of summed cells_filter_columns cells double, example: 100


Output arguments.

Name Description Attributes
--output Output h5mu file. file, required, example: "output.h5mu"
--output_compression string, example: "gzip"


  • Matthias Beyens

  • Dries De Maeyer (author)