Samtools sort

Sort and (optionally) index alignments.

Info

ID: samtools_sort
Namespace: mapping

Links

Reads are sorted by leftmost coordinates, or by read name when --sort_by_read_names is used.

An appropriate @HD-SO sort order header tag will be added or an existing one updated if necessary.

Note that to generate an index file (by specifying --output_bai), the default coordinate sort must be used. Thus the --sort_by_read_names and --sort_by <TAG> options are incompatible with --output_bai.

Example commands

You can run the pipeline using nextflow run.

View help

You can use --help as a parameter to get an overview of the possible parameters.

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -main-script target/nextflow/mapping/samtools_sort/main.nf \
  --help

Run command

Example of params.yaml

# Input
input: # please fill in - example: "input.bam"

# Output
# output_bam: "$id.$key.output_bam.bam"
# output_bai: "$id.$key.output_bai.bai"
# output_format: "bam"
# compression: 5

# Arguments
minimizer_cluster: false
# minimizer_kmer: 20
sort_by_read_names: false
# sort_by: "foo"
no_pg: false

# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"

nextflow run openpipelines-bio/openpipeline \
  -r 2.1.1 -latest \
  -profile docker \
  -main-script target/nextflow/mapping/samtools_sort/main.nf \
  -params-file params.yaml

Note

Replace -profile docker with -profile podman or -profile singularity depending on the desired backend.

Argument groups

Input

Name	Description	Attributes
`--input`	Path to the SAM/BAM/CRAM files containing the mapped reads.	`file`, required, example: `"input.bam"`

Output

Name	Description	Attributes
`--output_bam`	Filename to output the counts to.	`file`, required, example: `"output.bam"`
`--output_bai`	BAI-format index for BAM file.	`file`, example: `"output.bam.bai"`
`--output_format`	The output format. By default, samtools tries to select a format based on the -o filename extension; if output is to standard output or no format can be deduced, bam is selected.	`string`, example: `"bam"`
`--compression`	Compression level, from 0 (uncompressed) to 9 (best	`integer`, example: `5`

Arguments

Name	Description	Attributes
`--minimizer_cluster`	Sort unmapped reads (those in chromosome “*“) by their sequence minimiser (Schleimer et al., 2003; Roberts et al., 2004), also reverse complementing as appropriate. This has the effect of collating some similar data together, improving the compressibility of the unmapped sequence. The minimiser kmer size is adjusted using the -K option. Note data compressed in this manner may need to be name collated prior to conversion back to fastq. Mapped sequences are sorted by chromosome and position.	`boolean_true`
`--minimizer_kmer`	Sets the kmer size to be used in the -M option.	`integer`, example: `20`
`--sort_by_read_names`	Sort by read names (i.e., the QNAME field) rather than by chromosomal coordinates.	`boolean_true`
`--sort_by`	Sort first by this value in the alignment tag, then by position or name (if also using -n).	`string`
`--no_pg`	Do not add a @PG line to the header of the output file.	`boolean_true`

Authors

Robrecht Cannoodt (author, maintainer)
Angela Oliveira Pisco (author)