Cellxgene census
Query cells from a CellxGene Census or custom TileDBSoma object.
Info
ID: cellxgene_census
Namespace: query
Links
Aside from fetching the cells’ RNA counts (.X
), cell metadata (.obs
) and gene metadata (.var
), this component also fetches the dataset metadata and joins it into the cell metadata
Example commands
You can run the pipeline using nextflow run
.
View help
You can use --help
as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-main-script target/nextflow/query/cellxgene_census/main.nf \
--help
Run command
Example of params.yaml
# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
output_modality: "rna"
# Input database
# input_uri: "s3://bucket/path"
# census_version: "stable"
add_dataset_metadata: false
# Cell query
species: # please fill in - example: "homo_sapiens"
obs_value_filter: # please fill in - example: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
# Filter cells by grouping
# cell_filter_grouping: ["dataset_id", "tissue", "assay", "disease", "cell_type"]
# cell_filter_minimum_count: 100
# Count filtering
cell_filter_min_genes: 50
cell_filter_min_counts: 0
gene_filter_min_cells: 5
gene_filter_min_counts: 0
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-profile docker \
-main-script target/nextflow/query/cellxgene_census/main.nf \
-params-file params.yaml
Note
Replace -profile docker
with -profile podman
or -profile singularity
depending on the desired backend.
Argument groups
Input database
Open CellxGene Census by version or URI.
Name | Description | Attributes |
---|---|---|
--input_uri |
If specified, a URI containing the Census SOMA objects. If specified, will take precedence over the --census_version argument. |
string , example: "s3://bucket/path" |
--census_version |
Which release of CellxGene census to use. Possible values are “latest”, “stable”, or the date of one of the releases (e.g. “2023-07-25”). For more information, check the documentation on Census data releases. | string , example: "stable" |
--add_dataset_metadata |
If true, the experiment metadata will be added to the cell metadata. More specifically: collection_id , collection_name , collection_doi , dataset_title . |
boolean_true |
Cell query
Arguments related to the query.
Name | Description | Attributes |
---|---|---|
--species |
The organism to query, usually one of Homo sapiens or Mus musculus . |
string , required, example: "homo_sapiens" |
--obs_value_filter |
Filter for selecting the obs metadata (i.e. cells). Value is a filter query written in the SOMA value_filter syntax. |
string , required, example: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'" |
Filter cells by grouping
Filter groups with fewer than X number of cells.
Name | Description | Attributes |
---|---|---|
--cell_filter_grouping |
A subset of ‘obs’ columns by which to group the cells for filtering. Only groups surpassing or equal to the --cell_filter_minimum_count threshold will be retained. Take care not to introduce a selection bias against cells with more fine-grained ontology annotations. |
List of string , example: "dataset_id", "tissue", "assay", "disease", "cell_type" , multiple_sep: ";" |
--cell_filter_minimum_count |
A minimum number of cells per group to retain. If --cell_filter_grouping is defined, this parameter should also be provided and vice versa. |
integer , example: 100 |
Count filtering
Arguments related to filtering cells and genes by counts.
Name | Description | Attributes |
---|---|---|
--cell_filter_min_genes |
Remove cells with less than this number of genes. | integer , default: 50 |
--cell_filter_min_counts |
Remove cells with less than this number of counts. | integer , default: 0 |
--gene_filter_min_cells |
Remove genes expressed in less than this number of cells. | integer , default: 5 |
--gene_filter_min_counts |
Remove genes with less than this number of counts. | integer , default: 0 |
Outputs
Output arguments.
Name | Description | Attributes |
---|---|---|
--output |
Output h5mu file. | file , required, example: "output.h5mu" |
--output_compression |
string , example: "gzip" |
|
--output_modality |
Which modality to store the output in. | string , default: "rna" |