Htseq count
Quantify gene expression for subsequent testing for differential expression.
Info
ID: htseq_count
Namespace: mapping
Links
This script takes one or more alignment files in SAM/BAM format and a feature file in GFF format and calculates for each feature the number of reads mapping to it.
See http://htseq.readthedocs.io/en/master/count.html for details.
Example commands
You can run the pipeline using nextflow run
.
View help
You can use --help
as a parameter to get an overview of the possible parameters.
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-main-script target/nextflow/mapping/htseq_count/main.nf \
--help
Run command
Example of params.yaml
# Arguments
order: "name"
stranded: "yes"
minimum_alignment_quality: 10
# type: "exon"
# id_attribute: ["gene_id"]
# additional_attributes: ["gene_name"]
add_chromosome_info: false
mode: "union"
non_unique: "none"
# secondary_alignments: "foo"
# supplementary_alignments: "foo"
counts_output_sparse: false
# Input
input: # please fill in - example: ["mysample1.BAM", "mysample2.BAM"]
reference: # please fill in - example: "reference.gtf"
# Output
# output: "$id.$key.output.tsv"
# output_delimiter: " "
# output_sam: ["$id.$key.output_sam_*.BAM"]
# output_sam_format: "foo"
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
-r 1.0.2 -latest \
-profile docker \
-main-script target/nextflow/mapping/htseq_count/main.nf \
-params-file params.yaml
Note
Replace -profile docker
with -profile podman
or -profile singularity
depending on the desired backend.
Argument groups
Input
Name | Description | Attributes |
---|---|---|
--input |
Path to the SAM/BAM files containing the mapped reads. | List of file , required, example: "mysample1.BAM", "mysample2.BAM" , multiple_sep: ";" |
--reference |
Path to the GTF file containing the features. | file , required, example: "reference.gtf" |
Output
Name | Description | Attributes |
---|---|---|
--output |
Filename to output the counts to. | file , required, example: "htseq-count.tsv" |
--output_delimiter |
Column delimiter in output. | string , example: " " |
--output_sam |
Write out all SAM alignment records into SAM/BAM files (one per input file needed), annotating each line with its feature assignment (as an optional field with tag ‘XF’). See the -p option to use BAM instead of SAM. | List of file , example: "mysample1_out.BAM", "mysample2_out.BAM" , multiple_sep: ";" |
--output_sam_format |
Format to use with the –output_sam argument. | string |
Arguments
Name | Description | Attributes |
---|---|---|
--order |
Sorting order of |
string , default: "name" |
--stranded |
Whether the data is from a strand-specific assay. ‘reverse’ means ‘yes’ with reversed strand interpretation. | string , default: "yes" |
--minimum_alignment_quality |
Skip all reads with MAPQ alignment quality lower than the given minimum value. MAPQ is the 5th column of a SAM/BAM file and its usage depends on the software used to map the reads. | integer , default: 10 |
--type |
Feature type (3rd column in GTF file) to be used, all features of other type are ignored (default, suitable for Ensembl GTF files: exon) | string , example: "exon" |
--id_attribute |
GTF attribute to be used as feature ID (default, suitable for Ensembl GTF files: gene_id). All feature of the right type (see -t option) within the same GTF attribute will be added together. The typical way of using this option is to count all exonic reads from each gene and add the exons but other uses are possible as well. You can call this option multiple times: in that case, the combination of all attributes separated by colons (:) will be used as a unique identifier, e.g. for exons you might use -i gene_id -i exon_number. | List of string , example: "gene_id" , multiple_sep: ";" |
--additional_attributes |
Additional feature attributes (suitable for Ensembl GTF files: gene_name). Use multiple times for more than one additional attribute. These attributes are only used as annotations in the output, while the determination of how the counts are added together is done based on option -i. | List of string , example: "gene_name" , multiple_sep: ";" |
--add_chromosome_info |
Store information about the chromosome of each feature as an additional attribute (e.g. colunm in the TSV output file). | boolean_true |
--mode |
Mode to handle reads overlapping more than one feature. | string , default: "union" |
--non_unique |
Whether and how to score reads that are not uniquely aligned or ambiguously assigned to features. | string , default: "none" |
--secondary_alignments |
Whether to score secondary alignments (0x100 flag). | string |
--supplementary_alignments |
Whether to score supplementary alignments (0x800 flag). | string |
--counts_output_sparse |
Store the counts as a sparse matrix (mtx, h5ad, loom). | boolean_true |