Split h5mu train test
Split mudata object into training and testing (and validation) datasets based on observations into separate mudata objects.
Info
ID: split_h5mu_train_test
Namespace: dataflow
Links
Example commands
You can run the pipeline using nextflow run
.
View help
You can use --help
as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 2.1.0 -latest \
-main-script target/nextflow/dataflow/split_h5mu_train_test/main.nf \
--help
Run command
Example of params.yaml
# Inputs
input: # please fill in - example: "input.h5mu"
modality: "rna"
# Outputs
# output_train: "$id.$key.output_train.h5mu"
# output_test: "$id.$key.output_test.h5mu"
# output_val: "$id.$key.output_val.h5mu"
# compression: "gzip"
# Split arguments
test_size: 0.2
# val_size: 123.0
shuffle: false
# random_state: 123
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
# Arguments
nextflow run openpipelines-bio/openpipeline \
-r 2.1.0 -latest \
-profile docker \
-main-script target/nextflow/dataflow/split_h5mu_train_test/main.nf \
-params-file params.yaml
Note
Replace -profile docker
with -profile podman
or -profile singularity
depending on the desired backend.
Argument groups
Inputs
Input dataset in mudata format.
Name | Description | Attributes |
---|---|---|
--input |
The input (query) data to be labeled. Should be a .h5mu file. | file , required, example: "input.h5mu" |
--modality |
Which modality to process. | string , default: "rna" |
Outputs
Output arguments.
Name | Description | Attributes |
---|---|---|
--output_train |
The output training data in mudata format. | file , required, example: "output_train.h5mu" |
--output_test |
The output testing data in mudata format. | file , required, example: "output_test.h5mu" |
--output_val |
The output validation data in mudata format. | file , example: "output_val.h5mu" |
--compression |
string , example: "gzip" |
Split arguments
Model arguments.
Name | Description | Attributes |
---|---|---|
--test_size |
The proportion of the dataset to include in the test split. | double , default: 0.2 |
--val_size |
The proportion of the dataset to include in the validation split. | double |
--shuffle |
Whether or not to shuffle the data before splitting. | boolean_true |
--random_state |
The seed used by the random number generator. | integer |