Number of articles per page:
Nucleic Acids Research, (2008)
Streptococci are the causative agent of many human infectious diseases including bacterial pneumonia and meningitis. Here, we present Strepto-DB, a database for the comparative genome analysis of group A (GAS) and group B (GBS) streptococci. The known genomes of various GAS and GBS contain a large fraction of distributed genes that were found absent in other strains or serotypes of the same species. Strepto-DB identifies the homologous proteins deduced from the genomes of interest. It allows for the elucidation of the GAS and GBS core- and pan-genomes via genome-wide comparisons. Moreover, an intergenic region analysis tool provides alignments and predictions for transcription factor binding sites in the non-coding sequences. An interactive genome browser visualizes functional annotations. Strepto-DB (http://oger.tu-bs.de/strepto_db) was created by the use of OGeR, the Open Genome Resource for comparative analysis of prokaryotic genomes. OGeR is a newly developed open source database and tool platform for the web-based storage, distribution, visualization and comparison of prokaryotic genome data. The system automatically creates the dedicated relational database and web interface and imports an arbitrary number of genomes derived from standardized genome files. OGeR can be downloaded at http://oger.tu-bs.de.
Nature Reviews Genetics 9 (8), 573 (2008)
Research Highlight
Nature Reviews Genetics 9, 573 (August 2008) | doi:10.1038/nrg2420
Comparative genomics: Lining up is hard to do
Tanita Casci
Comparative genomicsLining up is hard to do
Comparisons between proteins or between DNA stretches tell a story about the evolutionary past. Unfortunately, however, which story they tell often depends on which method is used to line up the sequences. A new method claims to improve the accuracy of current alignments by making use not only of the similarity between sequences but also their phylogenetic history.
Commonly used methods for aligning multiple sequences rely on sequential pairwise alignments between sequences — the most closely related sequences are aligned first, followed by progressively more distant ones, all the way down the evolutionary tree. The problem with these so-called 'progressive algorithms' is that it is impossible to distinguish insertions from deletions when using only two sequences, causing biases that distort subsequent alignments. Traditional implementations of this algorithm set a higher penalty for insertions than for deletions and deem the former less likely to have occurred. These biases are particularly problematic when there are multiple neighbouring insertions or deletions — for example, such insertions end up being aligned one under the other even though, by definition, they are not evolutionarily related. Misleading alignments caused by these false homologies are propagated, and lead to incorrect inferences about sequence evolution.
Two sequences do not contain information about whether a length difference between them is caused by an insertion or a deletion — however, in a multiple (as opposed to pairwise) alignment there are more than two sequences. The new phylogeny-aware algorithm uses outgroup information from related sequences to infer which of the two possible changes has happened. It does so by 'flagging' the gaps (length changes) made in earlier alignments so that they can be accounted for in future alignments without incurring a penalty. The re-use of a flagged gap also indicates that a particular length change was created by an insertion; the gap can then be labelled as a 'permanent' insertion and correctly kept distinct from any neighbouring ones.
The new algorithm, called PRANK, was tested on simulated sequence data. Traditional methods were shown to introduce biases, and therefore misalignments, even when sequences are closely related. PRANK performs better than traditional methods in this situation and shows no bias as the sequences get more diverged and become harder to align. It is striking that traditional methods are error prone even when sequences get increasingly similar — indicating that denser species sampling and more DNA sequencing cannot correct the algorithmic flaw in the alignment.
Correct alignment remains a challenge, but less so than in the past. Why have programs traditionally performed so badly? The authors suggest that alignments were historically developed to compare protein sequences, in which any biases would be likely to cluster harmlessly in unconstrained regions. Given the increasingly widespread analysis of genomic DNA for understanding evolution there is a pressing need to get these methods up to scratch.
Links
WEB SITE
* PRANK
Top of page
References and links
ORIGINAL RESEARCH PAPER
1.
Löytynoja, A & Goldman, N. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320, 1632–1635 (2008)
* Article
* PubMed
* ChemPort
FURTHER READING
1.
Margulies, E. H. & Birney, E. Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nature Rev. Genet. 9, 303–313 (2008)
* Article
<< Prev 0 Showing entries 1 to 3 of 3 total Next 0 >>



