Xgboost
Performs label transfer from reference to query using XGBoost classifier
Info
ID: xgboost
Namespace: labels_transfer
Links
Example commands
You can run the pipeline using nextflow run
.
View help
You can use --help
as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-main-script target/nextflow/labels_transfer/xgboost/main.nf \
--help
Run command
Example of params.yaml
# Execution arguments
force_retrain: false
use_gpu: false
verbosity: 1
# model_output: "$id.$key.model_output.model_output"
# Learning parameters
learning_rate: 0.3
min_split_loss: 0
max_depth: 6
min_child_weight: 1
max_delta_step: 0
subsample: 1
sampling_method: "uniform"
colsample_bytree: 1
colsample_bylevel: 1
colsample_bynode: 1
reg_lambda: 1
reg_alpha: 0
scale_pos_weight: 1
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-profile docker \
-main-script target/nextflow/labels_transfer/xgboost/main.nf \
-params-file params.yaml
Note
Replace -profile docker
with -profile podman
or -profile singularity
depending on the desired backend.
Argument groups
Input dataset (query) arguments
Name | Description | Attributes |
---|---|---|
--input |
The query data to transfer the labels to. Should be a .h5mu file. | file , required |
--modality |
Which modality to use. | string , default: "rna" |
--input_obsm_features |
The .obsm key of the embedding to use for the classifier’s inference. If not provided, the .X slot will be used instead. Make sure that embedding was obtained in the same way as the reference embedding (e.g. by the same model or preprocessing). |
string , example: "X_integrated_scanvi" |
Reference dataset arguments
Name | Description | Attributes |
---|---|---|
--reference |
The reference data to train classifiers on. | file , example: "https:/zenodo.org/record/6337966/files/HLCA_emb_and_metadata.h5ad" |
--reference_obsm_features |
The .obsm key of the embedding to use for the classifier’s training. Make sure that embedding was obtained in the same way as the query embedding (e.g. by the same model or preprocessing). |
string , required, default: "X_integrated_scanvi" |
--reference_obs_targets |
The .obs key of the target labels to tranfer. |
List of string , default: "ann_level_1", "ann_level_2", "ann_level_3", "ann_level_4", "ann_level_5", "ann_finest_level" , multiple_sep: ";" |
Outputs
Name | Description | Attributes |
---|---|---|
--output |
The query data in .h5mu format with predicted labels transfered from the reference. | file , required |
--output_obs_predictions |
In which .obs slots to store the predicted information. If provided, must have the same length as --reference_obs_targets . If empty, will default to the reference_obs_targets combined with the "_pred" suffix. |
List of string , multiple_sep: ";" |
--output_obs_uncertainty |
In which .obs slots to store the uncertainty of the predictions. If provided, must have the same length as --reference_obs_targets . If empty, will default to the reference_obs_targets combined with the "_uncertainty" suffix. |
List of string , multiple_sep: ";" |
--output_uns_parameters |
The .uns key to store additional information about the parameters used for the label transfer. |
string , default: "labels_transfer" |
Execution arguments
Name | Description | Attributes |
---|---|---|
--force_retrain |
Retrain models on the reference even if model_output directory already has trained classifiers. WARNING! It will rewrite existing classifiers for targets in the model_output directory! | boolean_true |
--use_gpu |
Use GPU during models training and inference (recommended). | boolean , default: FALSE |
--verbosity |
The verbosity level for evaluation of the classifier from the range [0,2] | integer , default: 1 |
--model_output |
Output directory for model | file , default: "model" |
Learning parameters
Name | Description | Attributes |
---|---|---|
--learning_rate |
Step size shrinkage used in update to prevents overfitting. Range: [0,1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | double , default: 0.3 |
--min_split_loss |
Minimum loss reduction required to make a further partition on a leaf node of the tree. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | double , default: 0 |
--max_depth |
Maximum depth of a tree. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | integer , default: 6 |
--min_child_weight |
Minimum sum of instance weight (hessian) needed in a child. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | integer , default: 1 |
--max_delta_step |
Maximum delta step we allow each leaf output to be. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | double , default: 0 |
--subsample |
Subsample ratio of the training instances. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | double , default: 1 |
--sampling_method |
The method to use to sample the training instances. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | string , default: "uniform" |
--colsample_bytree |
Fraction of columns to be subsampled. Range (0, 1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | double , default: 1 |
--colsample_bylevel |
Subsample ratio of columns for each level. Range (0, 1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | double , default: 1 |
--colsample_bynode |
Subsample ratio of columns for each node (split). Range (0, 1]. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | double , default: 1 |
--reg_lambda |
L2 regularization term on weights. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | double , default: 1 |
--reg_alpha |
L1 regularization term on weights. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | double , default: 0 |
--scale_pos_weight |
Control the balance of positive and negative weights, useful for unbalanced classes. See https://xgboost.readthedocs.io/en/stable/parameter.html#parameters-for-tree-booster for the reference | double , default: 1 |