For most comparative questions in ecology and evolution, the portion of the genome relevant to the answer is typically small, therefore the challenge lies in discovering these informative regions efficiently and prior to significant investment in de novo assembly. Direct analysis of next-gen genomic sequence data could greatly simplify large comparative studies.
Prof. Chuck Cannon of Xishuangbanna Tropical Botanical Garden (XTBG) and his colleagues presented a reference-free comparative genomic approach that performs the comparative analysis prior to assembly, characterizing basic properties and segregating nucleotide sequence variation into smaller data partitions according to its distribution across genomes. Subsequent de novo assembly is therefore confined to only the portion of the genomic data relevant to a specific comparative question.
. They analyzed genomic sequence data from 174 plant chloroplasts, across a wide range of taxonomic relationships and divergence times, providing a broad perspective on chloroplast evolution in Viridiplantae and a rich framework for further exploration.
They found that the reference-free approach was an efficient and powerful way to compare unassembled short-read genomic sequence data. Their results agreed with and extended previous studies of several taxa, including the Geraniaceae and the clade of legumes that have lost their inverted repeat. More importantly, the localized de novo assemblies for each tip genome and set of target groups examined in detail produced a rich framework of information about genomic differences and similarities.
The study entitled “Reference-free comparative genomics of 174 chloroplasts” has been published in PLoS One 7(11): e48995. doi:10.1371/journal.pone.0048995
The phylokmer package for generating the kmer frequency tables and python scripts to perform both the tip and group analyses on the resulting kmer tables are available for download at: http://sourceforge.net/projects/referencefree