Cellxgene census

Query cells from a CellxGene Census or custom TileDBSoma object.

Info

ID: cellxgene_census
Namespace: query

Links

Aside from fetching the cells’ RNA counts (.X), cell metadata (.obs) and gene metadata (.var), this component also fetches the dataset metadata and joins it into the cell metadata

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/query/cellxgene_census/main.nf \
  --help

Run command

Example of params.yaml

# Input database
# input_uri: "s3://bucket/path"
# census_version: "stable"
add_dataset_metadata: false

# Cell query
species: # please fill in - example: "homo_sapiens"
obs_value_filter: # please fill in - example: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"

# Filter cells by grouping
# cell_filter_grouping: ["dataset_id", "tissue", "assay", "disease", "cell_type"]
# cell_filter_minimum_count: 100

# Count filtering
cell_filter_min_genes: 50
cell_filter_min_counts: 0
gene_filter_min_cells: 5
gene_filter_min_counts: 0

# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
output_modality: "rna"
# output_layer_counts: "foo"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/query/cellxgene_census/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input database

Open CellxGene Census by version or URI.

Name	Description	Attributes
`--input_uri`	If specified, a URI containing the Census SOMA objects. If specified, will take precedence over the `--census_version` argument.	`string`, example: `"s3://bucket/path"`
`--census_version`	Which release of CellxGene census to use. Possible values are “latest”, “stable”, or the date of one of the releases (e.g. “2023-07-25”). For more information, check the documentation on Census data releases.	`string`, example: `"stable"`
`--add_dataset_metadata`	If true, the experiment metadata will be added to the cell metadata. More specifically: `collection_id`, `collection_name`, `collection_doi`, `dataset_title`.	`boolean_true`

Cell query

Arguments related to the query.

Name	Description	Attributes
`--species`	The organism to query, usually one of `Homo sapiens` or `Mus musculus`.	`string`, required, example: `"homo_sapiens"`
`--obs_value_filter`	Filter for selecting the `obs` metadata (i.e. cells). Value is a filter query written in the SOMA `value_filter` syntax.	`string`, required, example: `"is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"`

Filter cells by grouping

Filter groups with fewer than X number of cells.

Name	Description	Attributes
`--cell_filter_grouping`	A subset of ‘obs’ columns by which to group the cells for filtering. Only groups surpassing or equal to the `--cell_filter_minimum_count` threshold will be retained. Take care not to introduce a selection bias against cells with more fine-grained ontology annotations.	List of `string`, example: `"dataset_id", "tissue", "assay", "disease", "cell_type"`, multiple_sep: `";"`
`--cell_filter_minimum_count`	A minimum number of cells per group to retain. If `--cell_filter_grouping` is defined, this parameter should also be provided and vice versa.	`integer`, example: `100`

Count filtering

Arguments related to filtering cells and genes by counts.

Name	Description	Attributes
`--cell_filter_min_genes`	Remove cells with less than this number of genes.	`integer`, default: `50`
`--cell_filter_min_counts`	Remove cells with less than this number of counts.	`integer`, default: `0`
`--gene_filter_min_cells`	Remove genes expressed in less than this number of cells.	`integer`, default: `5`
`--gene_filter_min_counts`	Remove genes with less than this number of counts.	`integer`, default: `0`

Outputs

Output arguments.

Name	Description	Attributes
`--output`	Output h5mu file.	`file`, required, example: `"output.h5mu"`
`--output_compression`		`string`, example: `"gzip"`
`--output_modality`	Which modality to store the output in.	`string`, default: `"rna"`
`--output_layer_counts`	Which layer to store the raw counts in. If not provided, the .X layer will be used.	`string`

Authors

Matthias Beyens (maintainer, author)
Dries De Maeyer (author)
Robrecht Cannoodt (author)
Kai Waldrant (contributor)