A. Tarasov, A. J. Vilella, E. Cuppen, I. J. Nijman, and P. Prins.Sambamba: fast processing of NGS alignment formats.Bioinformatics, 2015.
January 2023:announcing v1.0 release of the great sambamba tool!
A minor fix and a major release. After 10 years and over one thousand citations we can announce sambamba 1.0 stable!
Sambamba is a high performance, highly parallel, robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Because of its efficiency, Sambamba is an important work horse running in many sequencing centers around the world today. As of January 2023, Sambamba has been cited over1000 timesand has been installed from Conda over250,000 times.Sambamba is also distributed byDebian.To cite Sambamba, seeCredit.
Current functionality is an important subset ofsamtools
functionality including view, index, sort, markdup, and depth. Most
tools support piping. Just specify/dev/stdin
or/dev/stdout
as
filenames. When we started writing Sambamba in 2012, the main
advantage oversamtools
was parallelized BAM reading and writing.
In March 2017,samtools
1.4 was released, reaching some parity. Arecent performance comparisonshows that Sambamba still holds its ground and can even do better. Here are some comparisonmetrics.For example, for flagstat, Sambamba is 1.4x faster than samtools. For index, they are similar. For markdup, almost6xfaster, and for view,4xfaster. For sort, Sambamba has been beaten, though Sambamba is notably up to2xfaster than samtools on large RAM machines (120GB+).
In addition, Sambamba has a few interesting features to offer. In particular:
- Fast large machine
sort
,seeperformance - Automatic index creation when writing any coordinate-sorted file
view -L <bed file>
utilizes BAM index to skip unrelated chunksdepth
allows to measure base, sliding window, or region coverages- Chanjobuilds upon this and gets you to exon/gene levels of abstraction
markdup
,a fast implementation of Picard algorithmslice
quickly extracts a region into a new file, tweaking only first/last chunks- ...and more (you'll have to try)
The D language is extremely suitable for high performance computing (HPC). At this point, we think that the BAM format is here to stay for processing reference guided sequencing data, and we aim to make it easy to parse and process BAM files.
Sambamba is free and open source software, licensed under GPLv2+. See manual pagesonline to know more about what is available and how to use it.
For more information on Sambamba, contact the mailing list (seeGetting help).
Important Notice: With version 0.8 support for CRAM was removed from Sambamba (see theRELEASE NOTES)
To use CRAM, you can still use one of the older (binary) releasesof Sambamba.
For those not in the mood to learn/install new package managers, there are Github source and binary releases.Simply download the tarball, unpack it, and run it according to the accompanying release notes.
Below package managers Conda, GNU Guix, Debian and Homebrew also provide recent binary installs for Linux. For MacOS you may use Conda or Homebrew.
There should be binary downloads for Linux and MacOS.
With Conda use thebioconda
channel.
A reproducibleGNU Guix packagefor Sambamba is available. The development version is packagedhere.
See also Debian packagestatus.
Users of Homebrew can also use the formula fromhomebrew-bio.
brew install brewsci/bio/sambamba
It should work for Linux and MacOS.
Sambamba has a mailing list for installation help and general discussion.
Before posting an issue, search the issue tracker andmailing listfirst. It is likely someone may have encountered something similar. Also try running the latest version of Sambamba to make sure it has not been fixed already. Support/installation questions should be aimed at the mailing list. The issue tracker is for development issues around the software itself. When reporting an issue, include the output of the program and the contents of the output directory.
Please use the following checklist. It exists for multiple reasons.:)
- I have found an issue with Sambamba
- I have searched for it on theissue tracker(also check closed issues)
- I have searched for it on themailing list
- I have tried the latestreleaseof Sambamba
- I have read and agreed to the below code of conduct
- If it is a support/install question, I have posted it to themailing list
- If it is software development related, I have posted a new issue on theissue trackeror added to an existing one
- In the message, I have included the output of my Sambamba run
- In the message, I have included relevant files in the output directory
- (Optional) I have made the data available to reproduce the problem
To find bugs, the Sambamba software developers may ask to install a development version of the software. They may also ask you for your data and will treat it confidentially. Please always remember that Sambamba is written and maintained by volunteers with good intentions. Our time is valuable, too. By helping us as much as possible, we can provide this tool for everyone to use.
By using Sambamba and communicating with its community, you implicitly agree to abide by thecode of conductas published by the Software Carpentry initiative.
Note: In general, there is no need to compile Sambamba. You can use a recent binary install as listed above.
The preferred method for compiling Sambamba is with the LDC compiler which targets LLVM. LLVM version 6 is faster than earlier editions.
SeeINSTALL.md.
The LDC compiler's Github repository provides binary images. The current preferred release for Sambamba is LDC - the LLVM D compiler (>= 1.6.1). After installing LDC fromhttps://github /ldc-developers/ldc/releases/with, for example
cd
wget https://github /ldc-developers/ldc/releases/download/v$ver/ldc2-1.7.0-linux-x86_64.tar.xz
tar xvJf ldc2-1.7.0-linux-x86_64.tar.xz
exportPATH=$HOME/ldc2-1.7.0-linux-x86_64/bin:$PATH
exportLIBRARY_PATH=$HOME/ldc2-1.7.0-linux-x86_64/lib
git clone --recursive https://github /biod/sambamba.git
cdsambamba
make
To build a development/debug version run
make clean&&make debug
To run the test, fetch shunit2 fromhttps://github /kward/shunit2and put it in the path so you can run
make check
See alsoINSTALL.md.
Our development and release environment is GNU Guix. To build sambamba the LDC compiler is also available in GNU Guix:
guix package -A ldc
For more instructions seeINSTALL.md.
Debian uses a Meson+Ninja build. It may work with something like
meson build
cdbuild
ninja
ninjatest
time./build/sambamba sort HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
sambamba 1.0.1
by Artem Tarasov and Pjotr Prins (C) 2012-2023
LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0)
real 0m13.343s
user 2m11.663s
sys 0m4.232s
or possibly with some additional tuning runtimes get close to the optimized static build (seebenchmarks.
rm -rf build/;env D_LD=gold CC=gcc meson build --buildtype release
cdbuild/
env CC=gcc ninja
env CC=gcc ninjatest
time./build/sambamba sort HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
sambamba 1.0.1
by Artem Tarasov and Pjotr Prins (C) 2012-2023
LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0)
real 0m10.227s
user 2m7.203s
sys 0m4.039s
Sambamba builds on MacOS. We have a Travisintegration testas an example. It can be something like
brew install ldc
git clone --recursive https://github /biod/sambamba.git
cdsambamba
make
Sambamba development and issue tracker is on github.Developer documentation can be found in the source code and thedevelopment documentation.
Important Note: Some older Xeon processors segfault under heavy hyper threading - which Sambamba utilizes. Please read this when encountering seemingly random crashes. There is no real fix other than disabling hyperthreading. Also discussedhere.Thank Intel for producing this bug.
In a crash, Sambamba can dump a core file. To make this happen set
ulimit-c unlimited
and run your command. Send us the core file so we can reproduce the state at time of segfault.
Another option is to usecatchsegv
catchsegv./build/sambambacommand
This will show state on stdout which can be sent to us.
In case of crashes, it's helpful to have GDB stacktraces (bt
command). A full stacktrace for all threads:
thread apply all backtrace full
Note that GDB should be made aware of D garbage collector, which emits SIGUSR signals and GDB needs to ignore them with
handle SIGUSR1 SIGUSR2 nostop noprint
A binary, relocatable install of Sambamba with debug information and all dependencies can be fetched from the binary link above. Unpack the tarball and run the contained install.sh script with TARGET
./install.sh ~/sambamba-test
Run Sambamba in GDB with
gdb -ex 'handle SIGUSR1 SIGUSR2 nostop noprint' \
--args ~/sambamba-test/sambamba-*/bin/sambamba view --throw-error
Sambamba is generously distributed under GNU Public License v2+.
Citations are the bread and butter of science. If you are using Sambamba in your research and want to support our future work on Sambamba, please cite the following publication: A. Tarasov, A. J. Vilella, E. Cuppen, I. J. Nijman, and P. Prins.Sambamba: fast processing of NGS alignment formats.Bioinformatics, 2015.
@article{doi:10.1093/bioinformatics/btv098,
author={Tarasov, Artem and Vilella, Albert J. and Cuppen, Edwin and Nijman, Isaac J. and Prins, Pjotr},
title={Sambamba: fast processing of NGS alignment formats},
journal={Bioinformatics},
volume={31},
number={12},
pages={2032-2034},
year={2015},
doi={10.1093/bioinformatics/btv098},
URL={+ http://dx.doi.org/10.1093/bioinformatics/btv098}