Reference

An overview of the workflows and modules in OpenPipelines

Workflows

Name	Namespace	Description
BD Rhapsody	Workflows/ingestion	BD Rhapsody Sequence Analysis CWL pipeline v2.2.1
Bbknn leiden	Workflows/integration	Run bbknn followed by leiden clustering and run umap on the result.
Cell Ranger mapping	Workflows/ingestion	A pipeline for running Cell Ranger mapping.
Cell Ranger multi	Workflows/ingestion	A pipeline for running Cell Ranger multi.
Cell Ranger post-processing	Workflows/ingestion	Post-processing Cell Ranger datasets.
Convert to MuData	Workflows/ingestion	A pipeline to convert different file formats to .h5mu.
Demux	Workflows/ingestion	A generic pipeline for running bcl2fastq, bcl-convert or Cell Ranger mkfastq.
Dimensionality reduction	Workflows/multiomics	Run calculations that output information required for most integration methods: PCA, nearest neighbour and UMAP.
GDO Singlesample	Workflows/gdo	Processing unimodal single-sample guide-derived oligonucleotide (GDO) data.
Harmony integration followed by KNN label transfer	Workflows/annotation	Cell type annotation workflow by performing harmony integration of reference and query dataset followed by KNN label transfer.
Harmony leiden	Workflows/integration	Run harmony integration followed by neighbour calculations, leiden clustering and run umap on the result.
Make reference	Workflows/ingestion	Build a transcriptomics reference into one of many formats
Neighbors leiden umap	Workflows/multiomics	Performs neighborhood search, leiden clustering and run umap on an integrated embedding.
Process batches	Workflows/multiomics	This workflow serves as an entrypoint into the ‘full_pipeline’ in order to re-run the multisample processing and the integration setup.
Process samples	Workflows/multiomics	A pipeline to analyse multiple multiomics samples.
Prot multisample	Workflows/prot	Processing unimodal multi-sample ADT data.
Prot singlesample	Workflows/prot	Processing unimodal single-sample CITE-seq data.
Qc	Workflows/qc	A pipeline to add basic qc statistics to a MuData
Rna multisample	Workflows/rna	Processing unimodal multi-sample RNA transcriptomics data.
Rna singlesample	Workflows/rna	Processing unimodal single-sample RNA transcriptomics data.
Scanorama leiden	Workflows/integration	Run scanorama integration followed by neighbour calculations, leiden clustering and run umap on the result.
Scgpt leiden	Workflows/integration	Run scGPT integration (cell embedding generation) followed by neighbour calculations, leiden clustering and run umap on the result.
Scvi leiden	Workflows/integration	Run scvi integration followed by neighbour calculations, leiden clustering and run umap on the result.
Split h5mu	Workflows/multiomics	Split the samples of a single modality from a .h5mu (multimodal) sample into seperate .h5mu files based on the values of an .obs column of this modality
Split modalities	Workflows/multiomics	A pipeline to split a multimodal mudata files into several unimodal mudata files.
Totalvi leiden	Workflows/integration	Run totalVI integration followed by neighbour calculations, leiden clustering and run umap on the result.
scANVI - scArches workflow	Workflows/annotation	Cell type annotation workflow using ScanVI with scArches for reference mapping.
scGPT Annotation	Workflows/annotation	Cell type annotation workflow using scGPT.
scVI Annotation	Workflows/annotation	Cell type annotation workflow that performs scVI integration of reference and query dataset followed by KNN label transfer.

Modules

Name	Namespace	Description
Add id	Metadata	Add id of .obs.
Align query reference	Feature annotation	Alignment of a query and reference dataset by: * Alignment of layers * Harmonization of .obs field names for batch and cell type labels * Harmonization of .var field name for gene names * Sanitation of gene names * Cross-checking of genes * Assignment of an id to the query and reference datasets
Bbknn	Neighbors	BBKNN network generation
Bcftools	Genetic demux	Filter the variants called by freebayes or cellSNP
Bcl convert	Demux	Convert bcl files to fastq files using bcl-convert.
Bcl2fastq	Demux	Convert bcl files to fastq files using bcl2fastq
Bd rhapsody	Mapping	BD Rhapsody Sequence Analysis CWL pipeline v2.2.1 This pipeline performs analysis of single-cell multiomic sequence read (FASTQ) data.
Binning	Scgpt	Conversion of (pre-processed) expression count data into relative values (bins) to address scale differences across sequencing batches
Bpcells regress out	Transform	Regress out the effects of confounding variables using a linear least squares regression model with BPCells
Build bdrhap reference	Reference	The Reference Files Generator creates an archive containing Genome Index and Transcriptome annotation files needed for the BD Rhapsody Sequencing Analysis Pipeline.
Build cellranger arc reference	Reference	Build a Cell Ranger-arc and -atac compatible reference folder from user-supplied genome FASTA and gene GTF files.
Build cellranger reference	Reference	Build a Cell Ranger-compatible reference folder from user-supplied genome FASTA and gene GTF files.
Build star reference	Reference	Create a reference for STAR from a set of fasta files.
Calculate atac qc metrics	Qc	Add basic ATAC quality control metrics to an .h5mu file.
Calculate qc metrics	Qc	Add basic quality control metrics to an .h5mu file.
Cell type annotation	Scgpt	Annotate gene expression data with cell type classes through the scGPT model
Cellbender remove background	Correction	Eliminating technical artifacts from high-throughput single-cell RNA sequencing data.
Cellbender remove background v0 2	Correction	Eliminating technical artifacts from high-throughput single-cell RNA sequencing data.
Cellranger atac count	Mapping	Align fastq files using Cell Ranger ATAC count.
Cellranger atac mkfastq	Demux	Demultiplex raw sequencing data for ATAC experiments
Cellranger count	Mapping	Align fastq files using Cell Ranger count.
Cellranger count split	Mapping	Split 10x Cell Ranger output directory into separate output fields.
Cellranger mkfastq	Demux	Demultiplex raw sequencing data
Cellranger mkgtf	Reference	Make a GTF file - filter by a specific attribute.
Cellranger multi	Mapping	Align fastq files using Cell Ranger multi.
Cellsnp	Genetic demux	cellSNP aims to pileup the expressed alleles in single-cell or bulk RNA-seq data.
Celltypist	Annotate	Automated cell type annotation tool for scRNA-seq datasets on the basis of logistic regression classifiers optimised by the stochastic gradient descent algorithm.
Cellxgene census	Query	Query cells from a CellxGene Census or custom TileDBSoma object.
Clr	Transform	Perform CLR normalization on CITE-seq data (Stoeckius et al., 2017)
Compress h5mu	Compression	Compress a MuData file.
Concatenate h5mu	Dataflow	Concatenate observations from samples in several (uni- and/or multi-modal) MuData files into a single file
Cross check genes	Scgpt	Cross-check genes with pre-trained scGPT model
Delete layer	Transform	Delete an anndata layer from one or more modalities
Delimit fraction	Filter	Turns a column containing values between 0 and 1 into a boolean column based on thresholds
Demuxlet	Genetic demux	Demuxlet is a software tool to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing.
Densmap	Dimred	A modification of UMAP that adds an extra cost term in order to preserve information about the relative local density of the data.
Do filter	Filter	Remove observations and variables based on specified .obs and .var columns
Download file	Download	Download a file
Dsc pileup	Genetic demux	Dsc-pileup is a software tool to pileup reads and corresponding base quality for each overlapping SNPs and each barcode.
Embedding	Scgpt	Generation of cell embeddings for the integration of single cell transcriptomic count data using scGPT
Fastqc	Qc	Fastqc component, please see https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Filter 10xh5	Process 10xh5	Filter a 10x h5 dataset
Filter with counts	Filter	Filter scRNA-seq data based on the primary QC metrics.
Filter with scrublet	Filter	Doublet detection using the Scrublet method (Wolock, Lopez and Klein, 2019).
Find neighbors	Neighbors	Compute a neighborhood graph of observations [McInnes18].
Freebayes	Genetic demux	Freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs
Freemuxlet	Genetic demux	Freemuxlet is a software tool to deconvolute sample identity and identify multiplets when multiple samples are pooled by barcoded single cell sequencing.
From 10xh5 to h5mu	Convert	Converts a 10x h5 into an h5mu file
From 10xmtx to h5mu	Convert	Converts a 10x mtx into an h5mu file
From bd to 10x molecular barcode tags	Convert	Convert the molecular barcode sequence SAM tag from BD format (MA) to 10X format (UB)
From bdrhap to h5mu	Convert	Convert the output of a BD Rhapsody pipeline v2.x to a MuData h5 file
From cellranger multi to h5mu	Convert	Converts the output from cellranger multi to a single .h5mu file.
From h5ad to h5mu	Convert	Converts a single layer h5ad file into a single MuData object
From h5ad to seurat	Convert	Converts an h5ad file into a Seurat file
From h5mu to h5ad	Convert	Converts a h5mu file into a h5ad file
From h5mu to seurat	Convert	Converts an h5mu file into a Seurat file.
Grep annotation column	Metadata	Perform a regex lookup on a column from the annotation matrices .obs or .var.
Harmonypy	Integrate	Performs Harmony integration based as described in https://github.com/immunogenomics/harmony.
Highly variable features scanpy	Feature annotation	Annotate highly variable features [Satija15] [Zheng17] [Stuart19].
Htseq count	Mapping	Quantify gene expression for subsequent testing for differential expression.
Htseq count to h5mu	Mapping	Convert the htseq table to a h5mu
Intersect obs	Filter	Create an intersection between two or more modalities.
Join csv	Metadata	Join a csv containing metadata to the .obs or .var field of a mudata file.
Join uns to obs	Metadata	Join a data frame of length 1 (1 row index value) in .uns containing metadata to the .obs of a mudata file.
Knn	Labels transfer	This component performs label transfer from reference to query using a K-Neirest Neighbors classifier
Leiden	Cluster	Cluster cells using the [Leiden algorithm] [Traag18] implemented in the [Scanpy framework] [Wolf18].
Lianapy	Interpret	Performs LIANA integration based as described in https://github.com/saezlab/liana-py
Log1p	Transform	Logarithmize the data matrix.
Lsi	Dimred	Runs Latent Semantic Indexing.
Make params	Files	Looks for files in a directory and turn it in a params file.
Make reference	Reference	Preprocess and build a transcriptome reference.
Merge	Dataflow	Combine one or more single-modality .h5mu files together into one .h5mu file
Mermaid	Report	Generates a network from mermaid code
Move layer	Transform	Move a data matrix stored at the .layers or .X attributes in a MuData object to another layer.
Move obsm to obs	Metadata	Move a matrix from .obsm to .obs.
Multi star	Mapping	Align fastq files using STAR.
Multi star to h5mu	Mapping	Convert the output of `multi_star` to a h5mu
Multiqc	Qc	MultiQC aggregates results from bioinformatics analyses across many samples into a single report.
Normalize total	Transform	Normalize counts per cell.
Onclass	Annotate	OnClass is a python package for single-cell cell type annotation.
Pad tokenize	Scgpt	Tokenize and pad a batch of data for scGPT integration zero-shot inference or fine-tuning
Pca	Dimred	Computes PCA coordinates, loadings and variance decomposition.
Popv	Annotate	Performs popular major vote cell typing on single cell sequence data using multiple algorithms.
Publish	Transfer	Publish an artifact and optionally rename with parameters
Random forest annotation	Annotate	Automated cell type annotation tool for scRNA-seq datasets on the basis of random forest.
Regress out	Transform	Regress out (mostly) unwanted sources of variation.
Remove modality	Filter	Remove a modality from a .h5mu file
Samtools	Genetic demux	Filter the BAM according to the instruction of scSplit via Samtools.
Samtools sort	Mapping	Sort and (optionally) index alignments.
Scale	Transform	Scale data to unit variance and zero mean
Scanorama	Integrate	Use Scanorama to integrate different experiments
Scanvi	Annotate	scANVI () is a semi-supervised model for single-cell transcriptomics data.
Scarches	Integrate	Performs reference mapping with scArches
Score genes cell cycle scanpy	Feature annotation	Calculates the score associated to S phase and G2M phase and annotates the cell cycle phase for each cell, as implemented by scanpy.
Scsplit	Genetic demux	scsplit is a genotype-free demultiplexing methode of pooled single-cell RNA-seq, using a hidden state model for identifying genetically distinct samples within a mixed population.
Scvelo	Velocity	ID: `scvelo` Namespace: `velocity`
Scvi	Integrate	Performs scvi integration as done in the human lung cell atlas https://github.com/LungCellAtlas/HLCA
Souporcell	Genetic demux	souporcell is a method for clustering mixed-genotype scRNAseq experiments by individual.
Split h5mu	Dataflow	Split the samples of a single modality from a .h5mu (multimodal) sample into seperate .h5mu files based on the values of an .obs column of this modality.
Split h5mu train test	Dataflow	Split mudata object into training and testing (and validation) datasets based on observations into separate mudata objects.
Split modalities	Dataflow	Split the modalities from a single .h5mu multimodal sample into seperate .h5mu files.
Star align	Mapping	Align fastq files using STAR.
Star align v273a	Mapping	Align fastq files using STAR.
Subset h5mu	Filter	Create a subset of a mudata file by selecting the first number of observations
Subset obsp	Filter	Create a subset of an .obsp field in a mudata file, by filtering the columns based on the values of an .obs column.
Svm annotation	Annotate	Automated cell type annotation tool for scRNA-seq datasets on the basis of SVMs.
Sync test resources	Download	Sync test resources to the local filesystem
Tar extract	Compression	Extract files from a tar archive
Tfidf	Transform	Perform TF-IDF normalization of the data (typically, ATAC).
Totalvi	Integrate	Performs mapping to the reference by totalvi model: https://docs.scvi-tools.org/en/stable/tutorials/notebooks/scarches_scvi_tools.html#Reference-mapping-with-TOTALVI
Tsne	Dimred	t-SNE (t-Distributed Stochastic Neighbor Embedding) is a dimensionality reduction technique used to visualize high-dimensional data in a low-dimensional space, revealing patterns and clusters by preserving local data similarities
Umap	Dimred	UMAP (Uniform Manifold Approximation and Projection) is a manifold learning technique suitable for visualizing high-dimensional data.
Velocyto	Velocity	Runs the velocity analysis on a BAM file, outputting a loom file.
Velocyto to h5mu	Convert	Convert a velocyto loom file to a h5mu file.
Vireo	Genetic demux	Vireo is primarily designed for demultiplexing cells into donors by modelling of expressed alleles.
Xgboost	Labels transfer	Performs label transfer from reference to query using XGBoost classifier