Make reference

Build a transcriptomics reference into one of many formats

Info

ID: make_reference
Namespace: ingestion

Links

Source

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.10.0 -latest \
  -main-script ./workflows/ingestion/make_reference/main.nf \
  --help

Run command

Example of params.yaml

# Inputs
id: # please fill in - example: "foo"
genome_fasta: # please fill in - example: "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz"
transcriptome_gtf: # please fill in - example: "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz"
# ercc: "https://assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip"

# Outputs
target: ["star"]
# output_fasta: "$id.$key.output_fasta.gz"
# output_gtf: "$id.$key.output_gtf.gz"
# output_cellranger: "$id.$key.output_cellranger.gz"
# output_bd_rhapsody: "$id.$key.output_bd_rhapsody.gz"
# output_star: "$id.$key.output_star.gz"

# Arguments
# subset_regex: "(ERCC-00002|chr1)"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

nextflow run openpipelines-bio/openpipeline \
  -r 0.10.0 -latest \
  -profile docker \
  -main-script ./workflows/ingestion/make_reference/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name	Description	Attributes
`--id`	ID of the reference.	`string`, required, example: `"foo"`
`--genome_fasta`	Reference genome fasta.	`file`, required, example: `"https:/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz"`
`--transcriptome_gtf`	Reference transcriptome annotation.	`file`, required, example: `"https:/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz"`
`--ercc`	ERCC sequence and annotation file.	`file`, example: `"https:/assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip"`

Outputs

Name	Description	Attributes
`--target`	Which reference indices to generate.	List of `string`, default: `"star"`, multiple_sep: `":"`
`--output_fasta`	Output genome sequence fasta.	`file`, example: `"genome_sequence.fa.gz"`
`--output_gtf`	Output transcriptome annotation gtf.	`file`, example: `"transcriptome_annotation.gtf.gz"`
`--output_cellranger`	Output index	`file`, example: `"cellranger_index.tar.gz"`
`--output_bd_rhapsody`	Output index	`file`, example: `"bdrhap_index.tar.gz"`
`--output_star`	Output index	`file`, example: `"star_index.tar.gz"`

Arguments

Name	Description	Attributes
`--subset_regex`	Will subset the reference chromosomes using the given regex.	`string`, example: `"(ERCC-00002\|chr1)"`

Authors

Angela Oliveira Pisco (author)
Robrecht Cannoodt (author, maintainer)

Visualisation

flowchart LR
    p0(Input)
    p2(toSortedList)
    p4(flatMap)
    p11(make_reference)
    p13(join)
    p17(filter)
    p22(build_cellranger_reference)
    p24(join)
    p54(join)
    p29(filter)
    p34(build_bdrhap_reference)
    p36(join)
    p55(join)
    p41(filter)
    p46(star_build_reference)
    p48(join)
    p56(join)
    p57(join)
    p62(Output)
    p54-->p55
    p55-->p56
    p56-->p57
    p0-->p2
    p2-->p4
    p4-->p13
    p4-->p11
    p11-->p13
    p13-->p17
    p17-->p24
    p17-->p22
    p22-->p24
    p24-->p54
    p13-->p29
    p29-->p36
    p29-->p34
    p34-->p36
    p36-->p55
    p13-->p41
    p41-->p48
    p41-->p46
    p46-->p48
    p48-->p56
    p0-->p57
    p13-->p54
    p57-->p62