Svm annotation

Automated cell type annotation tool for scRNA-seq datasets on the basis of SVMs.

Info

ID: svm_annotation
Namespace: annotate

Links

Source

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/annotate/svm_annotation/main.nf \
  --help

Run command

Example of params.yaml

# Inputs
input: # please fill in - example: "input.h5mu"
modality: "rna"
# input_layer: "foo"
# input_var_gene_names: "foo"
input_reference_gene_overlap: 100

# Reference
# reference: "reference.h5mu"
# reference_layer: "foo"
reference_obs_target: # please fill in - example: "foo"
# reference_var_gene_names: "foo"
# reference_var_input: "foo"

# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
output_obs_prediction: "svm_pred"
output_obs_probability: "svm_probability"

# Model arguments
# model: "pretrained_model.pkl"
feature_selection: true
max_iter: 5000
c_reg: 1.0
class_weight: "balanced"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/annotate/svm_annotation/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Input dataset (query) arguments

Name	Description	Attributes
`--input`	The input (query) data to be labeled. Should be a .h5mu file.	`file`, required, example: `"input.h5mu"`
`--modality`	Which modality to process.	`string`, default: `"rna"`
`--input_layer`	The layer in the input data to be used for cell type annotation if .X is not to be used.	`string`
`--input_var_gene_names`	The name of the adata var column in the input data containing gene names; when no gene_name_layer is provided, the var index will be used.	`string`
`--input_reference_gene_overlap`	The minimum number of genes present in both the reference and query datasets.	`integer`, default: `100`

Reference

Arguments related to the reference dataset.

Name	Description	Attributes
`--reference`	The reference data to train the CellTypist classifiers on. Only required if a pre-trained –model is not provided.	`file`, example: `"reference.h5mu"`
`--reference_layer`	The layer in the reference data to be used for cell type annotation if .X is not to be used. Data are expected to be processed in the same way as the –input query dataset.	`string`
`--reference_obs_target`		`string`, required
`--reference_var_gene_names`	The name of the adata var column in the reference data containing gene names; when no gene_name_layer is provided, the var index will be used.	`string`
`--reference_var_input`	.var column containing highly variable genes. By default, do not subset genes.	`string`

Outputs

Output arguments.

Name	Description	Attributes
`--output`	Output h5mu file.	`file`, example: `"output.h5mu"`
`--output_compression`		`string`, example: `"gzip"`
`--output_obs_prediction`	In which `.obs` slots to store the predicted information.	`string`, default: `"svm_pred"`
`--output_obs_probability`	In which `.obs` slots to store the probability of the predictions.	`string`, default: `"svm_probability"`

Model arguments

Model arguments.

Name	Description	Attributes
`--model`	Pretrained model in pkl format. If not provided, the model will be trained on the reference data and –reference should be provided.	`file`, example: `"pretrained_model.pkl"`
`--feature_selection`	Whether to perform feature selection.	`boolean`, default: `TRUE`
`--max_iter`	Maximum number of iterations for the SVM.	`integer`, default: `5000`
`--c_reg`	Regularization parameter for the SVM.	`double`, default: `1`
`--class_weight`	“Class weights for the SVM. The `uniform` mode gives all classes a weight of one. The `balanced` mode (default) uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))”	`string`, default: `"balanced"`

Authors

Jakub Majercik (author)