Xgboost

Performs label transfer from reference to query using XGBoost classifier

Info

ID: xgboost
Namespace: labels_transfer

Links

Source

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/labels_transfer/xgboost/main.nf \
  --help

Run command

Example of params.yaml

# Input dataset (query) arguments
input: # please fill in - example: "path/to/file"
modality: "rna"
# input_obsm_features: "X_scvi"

# Reference dataset arguments
# reference: "reference.h5mu"
# reference_obsm_features: "X_scvi"
reference_obs_targets: ["ann_level_1", "ann_level_2", "ann_level_3", "ann_level_4", "ann_level_5", "ann_finest_level"]

# Outputs
# output: "$id.$key.output"
# output_obs_predictions: ["foo"]
# output_obs_probability: ["foo"]
# output_compression: "gzip"

# Execution arguments
force_retrain: false
use_gpu: false
verbosity: 1
# model_output: "model"
output_uns_parameters: "xgboost_parameters"

# Learning parameters
learning_rate: 0.3
min_split_loss: 0.0
max_depth: 6
min_child_weight: 1
max_delta_step: 0.0
subsample: 1.0
sampling_method: "uniform"
colsample_bytree: 1.0
colsample_bylevel: 1.0
colsample_bynode: 1.0
reg_lambda: 1.0
reg_alpha: 0.0
scale_pos_weight: 1.0

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

# Arguments

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/labels_transfer/xgboost/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input dataset (query) arguments

Name	Description	Attributes
`--input`	The query data to transfer the labels to. Should be a .h5mu file.	`file`, required
`--modality`	Which modality to use.	`string`, default: `"rna"`
`--input_obsm_features`	The `.obsm` key of the embedding to use for the classifier’s inference. If not provided, the `.X` slot will be used instead. Make sure that embedding was obtained in the same way as the reference embedding (e.g. by the same model or preprocessing).	`string`, example: `"X_scvi"`

Reference dataset arguments

Name	Description	Attributes
`--reference`	The reference data to train classifiers on.	`file`, example: `"reference.h5mu"`
`--reference_obsm_features`	The `.obsm` key of the embedding to use for the classifier’s training. If not provided, the `.X` slot will be used instead. Make sure that embedding was obtained in the same way as the query embedding (e.g. by the same model or preprocessing).	`string`, example: `"X_scvi"`
`--reference_obs_targets`	The `.obs` key(s) of the target labels to tranfer.	List of `string`, default: `"ann_level_1", "ann_level_2", "ann_level_3", "ann_level_4", "ann_level_5", "ann_finest_level"`, multiple_sep: `";"`

Outputs

Name	Description	Attributes
`--output`	The query data in .h5mu format with predicted labels transfered from the reference.	`file`, required
`--output_obs_predictions`	In which `.obs` slots to store the predicted information. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_pred"` suffix.	List of `string`, multiple_sep: `";"`
`--output_obs_probability`	In which `.obs` slots to store the probability of the predictions. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_probability"` suffix.	List of `string`, multiple_sep: `";"`
`--output_compression`	The compression format to be used on the output h5mu object.	`string`, example: `"gzip"`

Execution arguments

Name	Description	Attributes
`--force_retrain`	Retrain models on the reference even if model_output directory already has trained classifiers. WARNING! It will rewrite existing classifiers for targets in the model_output directory!	`boolean_true`
`--use_gpu`	Use GPU during models training and inference (recommended).	`boolean`, default: `FALSE`
`--verbosity`	The verbosity level for evaluation of the classifier from the range [0,2]	`integer`, default: `1`
`--model_output`	Output directory for model	`file`, default: `"model"`
`--output_uns_parameters`	The key in `uns` slot of the output AnnData object to store the parameters of the XGBoost classifier.	`string`, default: `"xgboost_parameters"`

Learning parameters

Name	Description	Attributes
`--learning_rate`	Step size shrinkage used in update to prevents overfitting. Range: [0,1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`double`, default: `0.3`
`--min_split_loss`	Minimum loss reduction required to make a further partition on a leaf node of the tree. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`double`, default: `0`
`--max_depth`	Maximum depth of a tree. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`integer`, default: `6`
`--min_child_weight`	Minimum sum of instance weight (hessian) needed in a child. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`integer`, default: `1`
`--max_delta_step`	Maximum delta step we allow each leaf output to be. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`double`, default: `0`
`--subsample`	Subsample ratio of the training instances. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`double`, default: `1`
`--sampling_method`	The method to use to sample the training instances. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`string`, default: `"uniform"`
`--colsample_bytree`	Fraction of columns to be subsampled. Range (0, 1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`double`, default: `1`
`--colsample_bylevel`	Subsample ratio of columns for each level. Range (0, 1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`double`, default: `1`
`--colsample_bynode`	Subsample ratio of columns for each node (split). Range (0, 1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`double`, default: `1`
`--reg_lambda`	L2 regularization term on weights. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`double`, default: `1`
`--reg_alpha`	L1 regularization term on weights. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`double`, default: `0`
`--scale_pos_weight`	Control the balance of positive and negative weights, useful for unbalanced classes. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference	`double`, default: `1`

Authors

Vladimir Shitov (author)