Samtools sort

Sort and (optionally) index alignments.

Info

ID: samtools_sort
Namespace: mapping

Reads are sorted by leftmost coordinates, or by read name when --sort_by_read_names is used.

An appropriate @HD-SO sort order header tag will be added or an existing one updated if necessary.

Note that to generate an index file (by specifying --output_bai), the default coordinate sort must be used. Thus the --sort_by_read_names and --sort_by <TAG> options are incompatible with --output_bai.

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 0.12.6 -latest \
  -main-script target/nextflow/mapping/samtools_sort/main.nf \
  --help

Run command

Example of params.yaml
# Arguments
minimizer_cluster: false
# minimizer_kmer: 20
sort_by_read_names: false
# sort_by: "foo"
no_pg: false

# Input
input: # please fill in - example: "input.bam"

# Output
# output_bam: "$id.$key.output_bam.bam"
# output_bai: "$id.$key.output_bai.bai"
# output_format: "bam"
# compression: 5

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
nextflow run openpipelines-bio/openpipeline \
  -r 0.12.6 -latest \
  -profile docker \
  -main-script target/nextflow/mapping/samtools_sort/main.nf \
  -params-file params.yaml
Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input

Name Description Attributes
--input Path to the SAM/BAM/CRAM files containing the mapped reads. file, required, example: "input.bam"

Output

Name Description Attributes
--output_bam Filename to output the counts to. file, required, example: "output.bam"
--output_bai BAI-format index for BAM file. file, example: "output.bam.bai"
--output_format The output format. By default, samtools tries to select a format based on the -o filename extension; if output is to standard output or no format can be deduced, bam is selected. string, example: "bam"
--compression Compression level, from 0 (uncompressed) to 9 (best integer, example: 5

Arguments

Name Description Attributes
--minimizer_cluster Sort unmapped reads (those in chromosome “*“) by their sequence minimiser (Schleimer et al., 2003; Roberts et al., 2004), also reverse complementing as appropriate. This has the effect of collating some similar data together, improving the compressibility of the unmapped sequence. The minimiser kmer size is adjusted using the -K option. Note data compressed in this manner may need to be name collated prior to conversion back to fastq. Mapped sequences are sorted by chromosome and position. boolean_true
--minimizer_kmer Sets the kmer size to be used in the -M option. integer, example: 20
--sort_by_read_names Sort by read names (i.e., the QNAME field) rather than by chromosomal coordinates. boolean_true
--sort_by Sort first by this value in the alignment tag, then by position or name (if also using -n). string
--no_pg Do not add a @PG line to the header of the output file. boolean_true

Authors

  • Robrecht Cannoodt (author, maintainer)

  • Angela Oliveira Pisco (author)