Filter with scrublet
Info
ID: filter_with_scrublet
Namespace: filter
Links
The method tests for potential doublets by using the expression profiles of cells to generate synthetic potential doubles which are tested against cells. The method returns a “doublet score” on which it calls for potential doublets.
For the source code please visit https://github.com/AllonKleinLab/scrublet.
For 10x we expect the doublet rates to be: Multiplet Rate (%) - # of Cells Loaded - # of Cells Recovered ~0.4% ~800 ~500 ~0.8% ~1,600 ~1,000 ~1.6% ~3,200 ~2,000 ~2.3% ~4,800 ~3,000 ~3.1% ~6,400 ~4,000 ~3.9% ~8,000 ~5,000 ~4.6% ~9,600 ~6,000 ~5.4% ~11,200 ~7,000 ~6.1% ~12,800 ~8,000 ~6.9% ~14,400 ~9,000 ~7.6% ~16,000 ~10,000
Example commands
You can run the pipeline using nextflow run
.
View help
You can use --help
as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-main-script target/nextflow/filter/filter_with_scrublet/main.nf \
--help
Run command
Example of params.yaml
# Arguments
input: # please fill in - example: "input.h5mu"
modality: "rna"
# layer: "foo"
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
obs_name_filter: "filter_with_scrublet"
do_subset: false
obs_name_doublet_score: "scrublet_doublet_score"
min_counts: 2
min_cells: 3
min_gene_variablity_percent: 85
num_pca_components: 30
distance_metric: "euclidean"
allow_automatic_threshold_detection_fail: false
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-profile docker \
-main-script target/nextflow/filter/filter_with_scrublet/main.nf \
-params-file params.yaml
Replace -profile docker
with -profile podman
or -profile singularity
depending on the desired backend.
Argument group
Arguments
Name | Description | Attributes |
---|---|---|
--input |
Input h5mu file | file , required, example: "input.h5mu" |
--modality |
string , default: "rna" |
|
--layer |
Input layer to use as data for calculating doublets. .X is used not specified. | string |
--output |
Output h5mu file. | file , example: "output.h5mu" |
--output_compression |
The compression format to be used on the output h5mu object. | string , example: "gzip" |
--obs_name_filter |
In which .obs slot to store a boolean array corresponding to which observations should be filtered out. | string , default: "filter_with_scrublet" |
--do_subset |
Whether to subset before storing the output. | boolean_true |
--obs_name_doublet_score |
Name of the doublet scores column in the obs slot of the returned object. | string , default: "scrublet_doublet_score" |
--min_counts |
The number of minimal UMI counts per cell that have to be present for initial cell detection. | integer , default: 2 |
--min_cells |
The number of cells in which UMIs for a gene were detected. | integer , default: 3 |
--min_gene_variablity_percent |
Used for gene filtering prior to PCA. Keep the most highly variable genes (in the top min_gene_variability_pctl percentile), as measured by the v-statistic [Klein et al., Cell 2015]. | double , default: 85 |
--num_pca_components |
Number of principal components to use during PCA dimensionality reduction. | integer , default: 30 |
--distance_metric |
The distance metric used for computing similarities. | string , default: "euclidean" |
--allow_automatic_threshold_detection_fail |
When scrublet fails to automatically determine the double score threshold, allow the component to continue and set the output columns to NA. | boolean_true |