Make reference
Build a transcriptomics reference into one of many formats
Info
ID: make_reference
Namespace: ingestion
Links
Example commands
You can run the pipeline using nextflow run.
View help
You can use --help as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 0.10.0 -latest \
-main-script ./workflows/ingestion/make_reference/main.nf \
--helpRun command
Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
genome_fasta: # please fill in - example: "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz"
transcriptome_gtf: # please fill in - example: "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz"
# ercc: "https://assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip"
# Outputs
target: ["star"]
# output_fasta: "$id.$key.output_fasta.gz"
# output_gtf: "$id.$key.output_gtf.gz"
# output_cellranger: "$id.$key.output_cellranger.gz"
# output_bd_rhapsody: "$id.$key.output_bd_rhapsody.gz"
# output_star: "$id.$key.output_star.gz"
# Arguments
# subset_regex: "(ERCC-00002|chr1)"
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"nextflow run openpipelines-bio/openpipeline \
-r 0.10.0 -latest \
-profile docker \
-main-script ./workflows/ingestion/make_reference/main.nf \
-params-file params.yaml
Note
Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.
Argument groups
Inputs
| Name | Description | Attributes |
|---|---|---|
--id |
ID of the reference. | string, required, example: "foo" |
--genome_fasta |
Reference genome fasta. | file, required, example: "https:/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz" |
--transcriptome_gtf |
Reference transcriptome annotation. | file, required, example: "https:/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz" |
--ercc |
ERCC sequence and annotation file. | file, example: "https:/assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip" |
Outputs
| Name | Description | Attributes |
|---|---|---|
--target |
Which reference indices to generate. | List of string, default: "star", multiple_sep: ":" |
--output_fasta |
Output genome sequence fasta. | file, example: "genome_sequence.fa.gz" |
--output_gtf |
Output transcriptome annotation gtf. | file, example: "transcriptome_annotation.gtf.gz" |
--output_cellranger |
Output index | file, example: "cellranger_index.tar.gz" |
--output_bd_rhapsody |
Output index | file, example: "bdrhap_index.tar.gz" |
--output_star |
Output index | file, example: "star_index.tar.gz" |
Arguments
| Name | Description | Attributes |
|---|---|---|
--subset_regex |
Will subset the reference chromosomes using the given regex. | string, example: "(ERCC-00002|chr1)" |
Visualisation
flowchart LR
p0(Input)
p2(toSortedList)
p4(flatMap)
p11(make_reference)
p13(join)
p17(filter)
p22(build_cellranger_reference)
p24(join)
p54(join)
p29(filter)
p34(build_bdrhap_reference)
p36(join)
p55(join)
p41(filter)
p46(star_build_reference)
p48(join)
p56(join)
p57(join)
p62(Output)
p54-->p55
p55-->p56
p56-->p57
p0-->p2
p2-->p4
p4-->p13
p4-->p11
p11-->p13
p13-->p17
p17-->p24
p17-->p22
p22-->p24
p24-->p54
p13-->p29
p29-->p36
p29-->p34
p34-->p36
p36-->p55
p13-->p41
p41-->p48
p41-->p46
p46-->p48
p48-->p56
p0-->p57
p13-->p54
p57-->p62