Make reference

Build a transcriptomics reference into one of many formats

Info

ID: make_reference
Namespace: ingestion

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.12.6 -latest \
  -main-script ./workflows/ingestion/make_reference/main.nf \
  --help

Run command

Example of params.yaml
# Inputs
id: # please fill in - example: "foo"
genome_fasta: # please fill in - example: "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz"
transcriptome_gtf: # please fill in - example: "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz"
# ercc: "https://assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip"

# Outputs
target: ["star"]
# output_fasta: "$id.$key.output_fasta.gz"
# output_gtf: "$id.$key.output_gtf.gz"
# output_cellranger: "$id.$key.output_cellranger.gz"
# output_bd_rhapsody: "$id.$key.output_bd_rhapsody.gz"
# output_star: "$id.$key.output_star.gz"

# Arguments
# subset_regex: "(ERCC-00002|chr1)"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 0.12.6 -latest \
  -profile docker \
  -main-script ./workflows/ingestion/make_reference/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name Description Attributes
--id ID of the reference. string, required, example: "foo"
--genome_fasta Reference genome fasta. file, required, example: "https:/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz"
--transcriptome_gtf Reference transcriptome annotation. file, required, example: "https:/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz"
--ercc ERCC sequence and annotation file. file, example: "https:/assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip"

Outputs

Name Description Attributes
--target Which reference indices to generate. List of string, default: "star", multiple_sep: ":"
--output_fasta Output genome sequence fasta. file, example: "genome_sequence.fa.gz"
--output_gtf Output transcriptome annotation gtf. file, example: "transcriptome_annotation.gtf.gz"
--output_cellranger Output index file, example: "cellranger_index.tar.gz"
--output_bd_rhapsody Output index file, example: "bdrhap_index.tar.gz"
--output_star Output index file, example: "star_index.tar.gz"

Arguments

Name Description Attributes
--subset_regex Will subset the reference chromosomes using the given regex. string, example: "(ERCC-00002|chr1)"

Authors

  • Angela Oliveira Pisco (author)

  • Robrecht Cannoodt (author, maintainer)

Visualisation

flowchart LR
    v0(Input)
    v2(toSortedList)
    v4(flatMap)
    v11(make_reference)
    v13(join)
    v17(filter)
    v22(build_cellranger_reference)
    v24(join)
    v54(join)
    v29(filter)
    v34(build_bdrhap_reference)
    v36(join)
    v55(join)
    v41(filter)
    v46(star_build_reference)
    v48(join)
    v56(join)
    v57(join)
    v62(Output)
    v54-->v55
    v55-->v56
    v56-->v57
    v0-->v2
    v2-->v4
    v4-->v13
    v4-->v11
    v11-->v13
    v13-->v17
    v17-->v24
    v17-->v22
    v22-->v24
    v24-->v54
    v13-->v29
    v29-->v36
    v29-->v34
    v34-->v36
    v36-->v55
    v13-->v41
    v41-->v48
    v41-->v46
    v46-->v48
    v48-->v56
    v0-->v57
    v13-->v54
    v57-->v62