New sparse graph approach to de novo genome assembly----Xishuangbanna Tropical Botanical Garden,CAS

Typical memory requirements for modern assemblers range in the hundreds of giga-bytes (GB) for human genome assembly. Recently, several methods were aimed at reducing the memory requirement of de novo genome assembly.

Mr. Ye Chengxi, formerly a staff member of Xishuangbanna Tropical Botanical Garden (XTBG), proposed an alternative approach to reduce memory usage which exploits the idea of sparseness in genome assembly. Specifically, instead of storing every single k-mer (in a de Bruijn graph) or read (in an overlap graph) as nodes, the researchers stored a sparse subset of those nodes while still ensuring the assembly can be performed. They demonstrated that the approach greatly reduced computational memory demands without sacrificing the accuracy of assembly.

The researchers have produced a proof-of-principle software package, Sparse Assembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. Sparse Assembler was tested with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.

The results reported by them strongly support the idea that a sparse assembly graph retains sufficient information for accurate and fast de novo genome assembly of moderate-size genomes in a cheap, desktop PC computing environment, which is usually only equipped with several gigabytes memory.

The study entitled “Exploiting sparseness in de novo genome assembly” has been published in BMC Bioinformatics 2012, 13(Suppl6):S1 doi:10.1186/1471-2105-13-S6-S1