Michael S. Waterman (USC)

Eulerian Paths and DNA Sequence Assembly

In 1975 when Fred Sanger was developing dideoxy termination sequencing, he found Roger Staden who developed the first computer program to assemble  longer DNA sequences from the reads. The reads were randomly located and oriented along the target DNA. Until recently all DNA sequence assembly programs were further elaborations of his original technique. They often consist of three major steps: compare all pairs of reads, find an approximate arrangement of the significant overlaps, and multiple alignment for this arrangement. Staden used a greedy assembly version of this method. In 1995 an elegant and entirely new approach was proposed in which each read is broken down into shorter overlapping words, and then a certain graph is constructed so that Eulerian paths in this graph correspond to the target DNA sequence. In this talk I will show how this graph is constructed and give some examples of its operation. Today for  new-generation sequencing, this Eulerian method is the method of choice.


Additional abstracts will be posted later