Cellxgene census

Query cells from a CellxGene Census or custom TileDBSoma object.

Info

ID: cellxgene_census
Namespace: query

Aside from fetching the cells’ RNA counts (.X), cell metadata (.obs) and gene metadata (.var), this component also fetches the dataset metadata and joins it into the cell metadata

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.2 -latest \
  -main-script target/nextflow/query/cellxgene_census/main.nf \
  --help

Run command

Example of params.yaml
# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
output_modality: "rna"

# Input database
# input_uri: "s3://bucket/path"
# census_version: "stable"
add_dataset_metadata: false

# Cell query
species: # please fill in - example: "homo_sapiens"
obs_value_filter: # please fill in - example: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"

# Filter cells by grouping
# cell_filter_grouping: ["dataset_id", "tissue", "assay", "disease", "cell_type"]
# cell_filter_minimum_count: 100

# Count filtering
cell_filter_min_genes: 50
cell_filter_min_counts: 0
gene_filter_min_cells: 5
gene_filter_min_counts: 0

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.2 -latest \
  -profile docker \
  -main-script target/nextflow/query/cellxgene_census/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input database

Open CellxGene Census by version or URI.

Name Description Attributes
--input_uri If specified, a URI containing the Census SOMA objects. If specified, will take precedence over the --census_version argument. string, example: "s3://bucket/path"
--census_version Which release of CellxGene census to use. Possible values are “latest”, “stable”, or the date of one of the releases (e.g. “2023-07-25”). For more information, check the documentation on Census data releases. string, example: "stable"
--add_dataset_metadata If true, the experiment metadata will be added to the cell metadata. More specifically: collection_id, collection_name, collection_doi, dataset_title. boolean_true

Cell query

Arguments related to the query.

Name Description Attributes
--species The organism to query, usually one of Homo sapiens or Mus musculus. string, required, example: "homo_sapiens"
--obs_value_filter Filter for selecting the obs metadata (i.e. cells). Value is a filter query written in the SOMA value_filter syntax. string, required, example: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"

Filter cells by grouping

Filter groups with fewer than X number of cells.

Name Description Attributes
--cell_filter_grouping A subset of ‘obs’ columns by which to group the cells for filtering. Only groups surpassing or equal to the --cell_filter_minimum_count threshold will be retained. Take care not to introduce a selection bias against cells with more fine-grained ontology annotations. List of string, example: "dataset_id", "tissue", "assay", "disease", "cell_type", multiple_sep: ";"
--cell_filter_minimum_count A minimum number of cells per group to retain. If --cell_filter_grouping is defined, this parameter should also be provided and vice versa. integer, example: 100

Count filtering

Arguments related to filtering cells and genes by counts.

Name Description Attributes
--cell_filter_min_genes Remove cells with less than this number of genes. integer, default: 50
--cell_filter_min_counts Remove cells with less than this number of counts. integer, default: 0
--gene_filter_min_cells Remove genes expressed in less than this number of cells. integer, default: 5
--gene_filter_min_counts Remove genes with less than this number of counts. integer, default: 0

Outputs

Output arguments.

Name Description Attributes
--output Output h5mu file. file, required, example: "output.h5mu"
--output_compression string, example: "gzip"
--output_modality Which modality to store the output in. string, default: "rna"

Authors

  • Matthias Beyens (maintainer, author)

  • Dries De Maeyer (author)

  • Robrecht Cannoodt (author)

  • Kai Waldrant (contributor)