Htseq count

Quantify gene expression for subsequent testing for differential expression.

Info

ID: htseq_count
Namespace: mapping

This script takes one or more alignment files in SAM/BAM format and a feature file in GFF format and calculates for each feature the number of reads mapping to it.

See http://htseq.readthedocs.io/en/master/count.html for details.

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -main-script target/nextflow/mapping/htseq_count/main.nf \
  --help

Run command

Example of params.yaml
# Arguments
order: "name"
stranded: "yes"
minimum_alignment_quality: 10
# type: "exon"
# id_attribute: ["gene_id"]
# additional_attributes: ["gene_name"]
add_chromosome_info: false
mode: "union"
non_unique: "none"
# secondary_alignments: "foo"
# supplementary_alignments: "foo"
counts_output_sparse: false

# Input
input: # please fill in - example: ["mysample1.BAM", "mysample2.BAM"]
reference: # please fill in - example: "reference.gtf"

# Output
# output: "$id.$key.output.tsv"
# output_delimiter: "   "
# output_sam: ["$id.$key.output_sam_*.BAM"]
# output_sam_format: "foo"

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 1.0.1 -latest \
  -profile docker \
  -main-script target/nextflow/mapping/htseq_count/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input

Name Description Attributes
--input Path to the SAM/BAM files containing the mapped reads. List of file, required, example: "mysample1.BAM", "mysample2.BAM", multiple_sep: ";"
--reference Path to the GTF file containing the features. file, required, example: "reference.gtf"

Output

Name Description Attributes
--output Filename to output the counts to. file, required, example: "htseq-count.tsv"
--output_delimiter Column delimiter in output. string, example: " "
--output_sam Write out all SAM alignment records into SAM/BAM files (one per input file needed), annotating each line with its feature assignment (as an optional field with tag ‘XF’). See the -p option to use BAM instead of SAM. List of file, example: "mysample1_out.BAM", "mysample2_out.BAM", multiple_sep: ";"
--output_sam_format Format to use with the –output_sam argument. string

Arguments

Name Description Attributes
--order Sorting order of . Paired-end sequencing data must be sorted either by position or by read name, and the sorting order must be specified. Ignored for single-end data. string, default: "name"
--stranded Whether the data is from a strand-specific assay. ‘reverse’ means ‘yes’ with reversed strand interpretation. string, default: "yes"
--minimum_alignment_quality Skip all reads with MAPQ alignment quality lower than the given minimum value. MAPQ is the 5th column of a SAM/BAM file and its usage depends on the software used to map the reads. integer, default: 10
--type Feature type (3rd column in GTF file) to be used, all features of other type are ignored (default, suitable for Ensembl GTF files: exon) string, example: "exon"
--id_attribute GTF attribute to be used as feature ID (default, suitable for Ensembl GTF files: gene_id). All feature of the right type (see -t option) within the same GTF attribute will be added together. The typical way of using this option is to count all exonic reads from each gene and add the exons but other uses are possible as well. You can call this option multiple times: in that case, the combination of all attributes separated by colons (:) will be used as a unique identifier, e.g. for exons you might use -i gene_id -i exon_number. List of string, example: "gene_id", multiple_sep: ";"
--additional_attributes Additional feature attributes (suitable for Ensembl GTF files: gene_name). Use multiple times for more than one additional attribute. These attributes are only used as annotations in the output, while the determination of how the counts are added together is done based on option -i. List of string, example: "gene_name", multiple_sep: ";"
--add_chromosome_info Store information about the chromosome of each feature as an additional attribute (e.g. colunm in the TSV output file). boolean_true
--mode Mode to handle reads overlapping more than one feature. string, default: "union"
--non_unique Whether and how to score reads that are not uniquely aligned or ambiguously assigned to features. string, default: "none"
--secondary_alignments Whether to score secondary alignments (0x100 flag). string
--supplementary_alignments Whether to score supplementary alignments (0x800 flag). string
--counts_output_sparse Store the counts as a sparse matrix (mtx, h5ad, loom). boolean_true

Authors

  • Robrecht Cannoodt (author, maintainer)

  • Angela Oliveira Pisco (author)