Htseq count

Quantify gene expression for subsequent testing for differential expression.

Info

ID: htseq_count
Namespace: mapping

Links

This script takes one or more alignment files in SAM/BAM format and a feature file in GFF format and calculates for each feature the number of reads mapping to it.

See http://htseq.readthedocs.io/en/master/count.html for details.

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/mapping/htseq_count/main.nf \
  --help

Run command

Example of params.yaml

# Input
input: # please fill in - example: ["mysample1.BAM", "mysample2.BAM"]
reference: # please fill in - example: "reference.gtf"

# Output
# output: "$id.$key.output.tsv"
# output_delimiter: "   "
# output_sam: ["$id.$key.output_sam_*.BAM"]
# output_sam_format: "foo"

# Arguments
order: "name"
stranded: "yes"
minimum_alignment_quality: 10
# type: "exon"
# id_attribute: ["gene_id"]
# additional_attributes: ["gene_name"]
add_chromosome_info: false
mode: "union"
non_unique: "none"
# secondary_alignments: "foo"
# supplementary_alignments: "foo"
counts_output_sparse: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/mapping/htseq_count/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input

Name	Description	Attributes
`--input`	Path to the SAM/BAM files containing the mapped reads.	List of `file`, required, example: `"mysample1.BAM", "mysample2.BAM"`, multiple_sep: `";"`
`--reference`	Path to the GTF file containing the features.	`file`, required, example: `"reference.gtf"`

Output

Name	Description	Attributes
`--output`	Filename to output the counts to.	`file`, required, example: `"htseq-count.tsv"`
`--output_delimiter`	Column delimiter in output.	`string`, example: `" "`
`--output_sam`	Write out all SAM alignment records into SAM/BAM files (one per input file needed), annotating each line with its feature assignment (as an optional field with tag ‘XF’). See the -p option to use BAM instead of SAM.	List of `file`, example: `"mysample1_out.BAM", "mysample2_out.BAM"`, multiple_sep: `";"`
`--output_sam_format`	Format to use with the –output_sam argument.	`string`

Arguments

Name	Description	Attributes
`--order`	Sorting order of . Paired-end sequencing data must be sorted either by position or by read name, and the sorting order must be specified. Ignored for single-end data.	`string`, default: `"name"`
`--stranded`	Whether the data is from a strand-specific assay. ‘reverse’ means ‘yes’ with reversed strand interpretation.	`string`, default: `"yes"`
`--minimum_alignment_quality`	Skip all reads with MAPQ alignment quality lower than the given minimum value. MAPQ is the 5th column of a SAM/BAM file and its usage depends on the software used to map the reads.	`integer`, default: `10`
`--type`	Feature type (3rd column in GTF file) to be used, all features of other type are ignored (default, suitable for Ensembl GTF files: exon)	`string`, example: `"exon"`
`--id_attribute`	GTF attribute to be used as feature ID (default, suitable for Ensembl GTF files: gene_id). All feature of the right type (see -t option) within the same GTF attribute will be added together. The typical way of using this option is to count all exonic reads from each gene and add the exons but other uses are possible as well. You can call this option multiple times: in that case, the combination of all attributes separated by colons (:) will be used as a unique identifier, e.g. for exons you might use -i gene_id -i exon_number.	List of `string`, example: `"gene_id"`, multiple_sep: `";"`
`--additional_attributes`	Additional feature attributes (suitable for Ensembl GTF files: gene_name). Use multiple times for more than one additional attribute. These attributes are only used as annotations in the output, while the determination of how the counts are added together is done based on option -i.	List of `string`, example: `"gene_name"`, multiple_sep: `";"`
`--add_chromosome_info`	Store information about the chromosome of each feature as an additional attribute (e.g. colunm in the TSV output file).	`boolean_true`
`--mode`	Mode to handle reads overlapping more than one feature.	`string`, default: `"union"`
`--non_unique`	Whether and how to score reads that are not uniquely aligned or ambiguously assigned to features.	`string`, default: `"none"`
`--secondary_alignments`	Whether to score secondary alignments (0x100 flag).	`string`
`--supplementary_alignments`	Whether to score supplementary alignments (0x800 flag).	`string`
`--counts_output_sparse`	Store the counts as a sparse matrix (mtx, h5ad, loom).	`boolean_true`

Authors

Robrecht Cannoodt (author, maintainer)
Angela Oliveira Pisco (author)