Make reference

Build a transcriptomics reference into one of many formats

Info

ID: make_reference
Namespace: workflows/ingestion

Links

Source

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/workflows/ingestion/make_reference/main.nf \
  --help

Run command

Example of params.yaml

# Inputs
id: # please fill in - example: "foo"
genome_fasta: # please fill in - example: "https:/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz"
transcriptome_gtf: # please fill in - example: "https:/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz"
# ercc: "https:/assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip"

# STAR Settings
star_genome_sa_index_nbases: 14

# BD Rhapsody Settings
bdrhap_mitochondrial_contigs: ["chrM", "chrMT", "M", "MT"]
bdrhap_filtering_off: false
bdrhap_wta_only_index: false
# bdrhap_extra_star_params: "--limitGenomeGenerateRAM 48000 --genomeSAindexNbases 11"

# Cellranger ARC options
# motifs_file: "path/to/file"
# non_nuclear_contigs: ["foo"]

# Outputs
target: ["star"]
# output_fasta: "$id.$key.output_fasta.gz"
# output_gtf: "$id.$key.output_gtf.gz"
# output_cellranger: "$id.$key.output_cellranger.gz"
# output_cellranger_arc: "$id.$key.output_cellranger_arc.gz"
# output_bd_rhapsody: "$id.$key.output_bd_rhapsody.gz"
# output_star: "$id.$key.output_star.gz"

# Arguments
# subset_regex: "(ERCC-00002|chr1)"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/workflows/ingestion/make_reference/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Inputs

Name	Description	Attributes
`--id`	ID of the reference.	`string`, required, example: `"foo"`
`--genome_fasta`	Reference genome fasta.	`file`, required, example: `"https:/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/GRCh38.primary_assembly.genome.fa.gz"`
`--transcriptome_gtf`	Reference transcriptome annotation.	`file`, required, example: `"https:/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_41/gencode.v41.annotation.gtf.gz"`
`--ercc`	ERCC sequence and annotation file.	`file`, example: `"https:/assets.thermofisher.com/TFS-Assets/LSG/manuals/ERCC92.zip"`

STAR Settings

Name	Description	Attributes
`--star_genome_sa_index_nbases`	Length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter {genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).	`integer`, default: `14`

BD Rhapsody Settings

Name	Description	Attributes
`--bdrhap_mitochondrial_contigs`	Names of the Mitochondrial contigs in the provided Reference Genome. Fragments originating from contigs other than these are identified as ‘nuclear fragments’ in the ATACseq analysis pipeline.	List of `string`, default: `"chrM", "chrMT", "M", "MT"`, multiple_sep: `";"`
`--bdrhap_filtering_off`	By default the input Transcript Annotation files are filtered based on the gene_type/gene_biotype attribute. Only features having the following attribute values are kept: - protein_coding - lncRNA - IG_LV_gene - IG_V_gene - IG_V_pseudogene - IG_D_gene - IG_J_gene - IG_J_pseudogene - IG_C_gene - IG_C_pseudogene - TR_V_gene - TR_V_pseudogene - TR_D_gene - TR_J_gene - TR_J_pseudogene - TR_C_gene If you have already pre-filtered the input Annotation files and/or wish to turn-off the filtering, please set this option to True.	`boolean_true`
`--bdrhap_wta_only_index`	Build a WTA only index, otherwise builds a WTA + ATAC index.	`boolean_true`
`--bdrhap_extra_star_params`	Additional parameters to pass to STAR when building the genome index. Specify exactly like how you would on the command line.	`string`, example: `"--limitGenomeGenerateRAM 48000 --genomeSAindexNbases 11"`

Cellranger ARC options

Name	Description	Attributes
`--motifs_file`	Path to file containing transcription factor motifs in JASPAR format.	`file`
`--non_nuclear_contigs`	Name(s) of contig(s) that do not have any chromatin structure, for example, mitochondria or plastids. These contigs are excluded from peak calling since the entire contig will be “open” due to a lack of chromatin structure. Leave empty if there are no such contigs.	List of `string`, multiple_sep: `";"`

Outputs

Name	Description	Attributes
`--target`	Which reference indices to generate.	List of `string`, default: `"star"`, multiple_sep: `";"`
`--output_fasta`	Output genome sequence fasta.	`file`, example: `"genome_sequence.fa.gz"`
`--output_gtf`	Output transcriptome annotation gtf.	`file`, example: `"transcriptome_annotation.gtf.gz"`
`--output_cellranger`	Output index	`file`, example: `"cellranger_index.tar.gz"`
`--output_cellranger_arc`	Output index	`file`, example: `"cellranger_index_arc.tar.gz"`
`--output_bd_rhapsody`	Output index	`file`, example: `"bdrhap_index.tar.gz"`
`--output_star`	Output index	`file`, example: `"star_index.tar.gz"`

Arguments

Name	Description	Attributes
`--subset_regex`	Will subset the reference chromosomes using the given regex.	`string`, example: `"(ERCC-00002\|chr1)"`

Authors

Angela Oliveira Pisco (author)
Robrecht Cannoodt (author, maintainer)
Weiwei Schultz (contributor)

Visualisation

flowchart TB
    v0(Channel.fromList)
    v2(filter)
    v10(filter)
    v18(make_reference_component)
    v25(cross)
    v35(cross)
    v44(branch)
    v71(concat)
    v49(build_cellranger_arc_reference)
    v56(cross)
    v66(cross)
    v75(branch)
    v102(concat)
    v80(build_cellranger_reference)
    v87(cross)
    v97(cross)
    v106(branch)
    v133(concat)
    v111(build_star_reference)
    v118(cross)
    v128(cross)
    v137(branch)
    v164(concat)
    v142(build_bdrhap_reference)
    v149(cross)
    v159(cross)
    v171(cross)
    v178(cross)
    v190(cross)
    v197(cross)
    v201(Output)
    v44-->v71
    v75-->v102
    v106-->v133
    v137-->v164
    v0-->v2
    v2-->v10
    v10-->v18
    v18-->v25
    v10-->v25
    v10-->v35
    v44-->v49
    v49-->v56
    v44-->v56
    v44-->v66
    v66-->v71
    v75-->v80
    v80-->v87
    v75-->v87
    v75-->v97
    v97-->v102
    v106-->v111
    v111-->v118
    v106-->v118
    v106-->v128
    v128-->v133
    v137-->v142
    v142-->v149
    v137-->v149
    v137-->v159
    v159-->v164
    v164-->v171
    v2-->v171
    v171-->v178
    v2-->v178
    v2-->v190
    v190-->v197
    v2-->v197
    v197-->v201
    v18-->v35
    v35-->v44
    v49-->v66
    v71-->v75
    v80-->v97
    v102-->v106
    v111-->v128
    v133-->v137
    v142-->v159
    v164-->v190
    style v0 fill:#e3dcea,stroke:#7a4baa;
    style v2 fill:#e3dcea,stroke:#7a4baa;
    style v10 fill:#e3dcea,stroke:#7a4baa;
    style v18 fill:#e3dcea,stroke:#7a4baa;
    style v25 fill:#e3dcea,stroke:#7a4baa;
    style v35 fill:#e3dcea,stroke:#7a4baa;
    style v44 fill:#e3dcea,stroke:#7a4baa;
    style v71 fill:#e3dcea,stroke:#7a4baa;
    style v49 fill:#e3dcea,stroke:#7a4baa;
    style v56 fill:#e3dcea,stroke:#7a4baa;
    style v66 fill:#e3dcea,stroke:#7a4baa;
    style v75 fill:#e3dcea,stroke:#7a4baa;
    style v102 fill:#e3dcea,stroke:#7a4baa;
    style v80 fill:#e3dcea,stroke:#7a4baa;
    style v87 fill:#e3dcea,stroke:#7a4baa;
    style v97 fill:#e3dcea,stroke:#7a4baa;
    style v106 fill:#e3dcea,stroke:#7a4baa;
    style v133 fill:#e3dcea,stroke:#7a4baa;
    style v111 fill:#e3dcea,stroke:#7a4baa;
    style v118 fill:#e3dcea,stroke:#7a4baa;
    style v128 fill:#e3dcea,stroke:#7a4baa;
    style v137 fill:#e3dcea,stroke:#7a4baa;
    style v164 fill:#e3dcea,stroke:#7a4baa;
    style v142 fill:#e3dcea,stroke:#7a4baa;
    style v149 fill:#e3dcea,stroke:#7a4baa;
    style v159 fill:#e3dcea,stroke:#7a4baa;
    style v171 fill:#e3dcea,stroke:#7a4baa;
    style v178 fill:#e3dcea,stroke:#7a4baa;
    style v190 fill:#e3dcea,stroke:#7a4baa;
    style v197 fill:#e3dcea,stroke:#7a4baa;
    style v201 fill:#e3dcea,stroke:#7a4baa;