Xgboost

Performs label transfer from reference to query using XGBoost classifier

Info

ID: xgboost
Namespace: labels_transfer

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -main-script target/nextflow/labels_transfer/xgboost/main.nf \
  --help

Run command

Example of params.yaml
# Execution arguments
force_retrain: false
use_gpu: false
verbosity: 1
# model_output: "$id.$key.model_output.model_output"

# Learning parameters
learning_rate: 0.3
min_split_loss: 0
max_depth: 6
min_child_weight: 1
max_delta_step: 0
subsample: 1
sampling_method: "uniform"
colsample_bytree: 1
colsample_bylevel: 1
colsample_bynode: 1
reg_lambda: 1
reg_alpha: 0
scale_pos_weight: 1

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -profile docker \
  -main-script target/nextflow/labels_transfer/xgboost/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input dataset (query) arguments

Name Description Attributes
--input The query data to transfer the labels to. Should be a .h5mu file. file, required
--modality Which modality to use. string, default: "rna"
--input_obsm_features The .obsm key of the embedding to use for the classifier’s inference. If not provided, the .X slot will be used instead. Make sure that embedding was obtained in the same way as the reference embedding (e.g. by the same model or preprocessing). string, example: "X_integrated_scanvi"

Reference dataset arguments

Name Description Attributes
--reference The reference data to train classifiers on. file, example: "https:/zenodo.org/record/6337966/files/HLCA_emb_and_metadata.h5ad"
--reference_obsm_features The .obsm key of the embedding to use for the classifier’s training. Make sure that embedding was obtained in the same way as the query embedding (e.g. by the same model or preprocessing). string, required, default: "X_integrated_scanvi"
--reference_obs_targets The .obs key of the target labels to tranfer. List of string, default: "ann_level_1", "ann_level_2", "ann_level_3", "ann_level_4", "ann_level_5", "ann_finest_level", multiple_sep: ";"

Outputs

Name Description Attributes
--output The query data in .h5mu format with predicted labels transfered from the reference. file, required
--output_obs_predictions In which .obs slots to store the predicted information. If provided, must have the same length as --reference_obs_targets. If empty, will default to the reference_obs_targets combined with the "_pred" suffix. List of string, multiple_sep: ";"
--output_obs_uncertainty In which .obs slots to store the uncertainty of the predictions. If provided, must have the same length as --reference_obs_targets. If empty, will default to the reference_obs_targets combined with the "_uncertainty" suffix. List of string, multiple_sep: ";"
--output_uns_parameters The .uns key to store additional information about the parameters used for the label transfer. string, default: "labels_transfer"

Execution arguments

Name Description Attributes
--force_retrain Retrain models on the reference even if model_output directory already has trained classifiers. WARNING! It will rewrite existing classifiers for targets in the model_output directory! boolean_true
--use_gpu Use GPU during models training and inference (recommended). boolean, default: FALSE
--verbosity The verbosity level for evaluation of the classifier from the range [0,2] integer, default: 1
--model_output Output directory for model file, default: "model"

Learning parameters

Name Description Attributes
--learning_rate Step size shrinkage used in update to prevents overfitting. Range: [0,1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference double, default: 0.3
--min_split_loss Minimum loss reduction required to make a further partition on a leaf node of the tree. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference double, default: 0
--max_depth Maximum depth of a tree. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference integer, default: 6
--min_child_weight Minimum sum of instance weight (hessian) needed in a child. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference integer, default: 1
--max_delta_step Maximum delta step we allow each leaf output to be. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference double, default: 0
--subsample Subsample ratio of the training instances. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference double, default: 1
--sampling_method The method to use to sample the training instances. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference string, default: "uniform"
--colsample_bytree Fraction of columns to be subsampled. Range (0, 1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference double, default: 1
--colsample_bylevel Subsample ratio of columns for each level. Range (0, 1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference double, default: 1
--colsample_bynode Subsample ratio of columns for each node (split). Range (0, 1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference double, default: 1
--reg_lambda L2 regularization term on weights. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference double, default: 1
--reg_alpha L1 regularization term on weights. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference double, default: 0
--scale_pos_weight Control the balance of positive and negative weights, useful for unbalanced classes. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference double, default: 1

Authors

  • Vladimir Shitov (author)