Cellxgene census
    Query cells from a CellxGene Census or custom TileDBSoma object.
  
Info
ID: cellxgene_census
Namespace: query
Links
Aside from fetching the cells’ RNA counts (.X), cell metadata (.obs) and gene metadata (.var), this component also fetches the dataset metadata and joins it into the cell metadata
Example commands
You can run the pipeline using nextflow run.
View help
You can use --help as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/query/cellxgene_census/main.nf \
  --helpRun command
Example of params.yaml
# Input database
# input_uri: "s3://bucket/path"
# census_version: "stable"
add_dataset_metadata: false
# Cell query
species: # please fill in - example: "homo_sapiens"
obs_value_filter: # please fill in - example: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
# Filter cells by grouping
# cell_filter_grouping: ["dataset_id", "tissue", "assay", "disease", "cell_type"]
# cell_filter_minimum_count: 100
# Count filtering
cell_filter_min_genes: 50
cell_filter_min_counts: 0
gene_filter_min_cells: 5
gene_filter_min_counts: 0
# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
output_modality: "rna"
# output_layer_counts: "foo"
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
# Argumentsnextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/query/cellxgene_census/main.nf \
  -params-file params.yaml
Note
Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.
Argument groups
Input database
Open CellxGene Census by version or URI.
| Name | Description | Attributes | 
|---|---|---|
--input_uri | 
If specified, a URI containing the Census SOMA objects. If specified, will take precedence over the --census_version argument. | 
string, example: "s3://bucket/path" | 
--census_version | 
Which release of CellxGene census to use. Possible values are “latest”, “stable”, or the date of one of the releases (e.g. “2023-07-25”). For more information, check the documentation on Census data releases. | string, example: "stable" | 
--add_dataset_metadata | 
If true, the experiment metadata will be added to the cell metadata. More specifically: collection_id, collection_name, collection_doi, dataset_title. | 
boolean_true | 
Cell query
Arguments related to the query.
| Name | Description | Attributes | 
|---|---|---|
--species | 
The organism to query, usually one of Homo sapiens or Mus musculus. | 
string, required, example: "homo_sapiens" | 
--obs_value_filter | 
Filter for selecting the obs metadata (i.e. cells). Value is a filter query written in the SOMA value_filter syntax. | 
string, required, example: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'" | 
Filter cells by grouping
Filter groups with fewer than X number of cells.
| Name | Description | Attributes | 
|---|---|---|
--cell_filter_grouping | 
A subset of ‘obs’ columns by which to group the cells for filtering. Only groups surpassing or equal to the --cell_filter_minimum_count threshold will be retained. Take care not to introduce a selection bias against cells with more fine-grained ontology annotations. | 
List of string, example: "dataset_id", "tissue", "assay", "disease", "cell_type", multiple_sep: ";" | 
--cell_filter_minimum_count | 
A minimum number of cells per group to retain. If --cell_filter_grouping is defined, this parameter should also be provided and vice versa. | 
integer, example: 100 | 
Count filtering
Arguments related to filtering cells and genes by counts.
| Name | Description | Attributes | 
|---|---|---|
--cell_filter_min_genes | 
Remove cells with less than this number of genes. | integer, default: 50 | 
--cell_filter_min_counts | 
Remove cells with less than this number of counts. | integer, default: 0 | 
--gene_filter_min_cells | 
Remove genes expressed in less than this number of cells. | integer, default: 5 | 
--gene_filter_min_counts | 
Remove genes with less than this number of counts. | integer, default: 0 | 
Outputs
Output arguments.
| Name | Description | Attributes | 
|---|---|---|
--output | 
Output h5mu file. | file, required, example: "output.h5mu" | 
--output_compression | 
string, example: "gzip" | 
|
--output_modality | 
Which modality to store the output in. | string, default: "rna" | 
--output_layer_counts | 
Which layer to store the raw counts in. If not provided, the .X layer will be used. | string |