In 1975 when Fred Sanger was developing dideoxy termination sequencing, he found Roger Staden who developed the first computer program to assemblelonger DNA
sequences from the reads. The reads were randomly located and oriented along
the target DNA. Until recently all DNA sequence assembly programs were further
elaborations of his original technique. They often consist of three major
steps: compare all pairs of reads, find an approximate arrangement of the significant
overlaps, and multiple alignment for this arrangement.
Staden used a greedy assembly version of this
method. In 1995 an elegant and entirely new approach was proposed in which each
read is broken down into shorter overlapping words, and then a certain graph is
constructed so that Eulerian paths in this graph correspond
to the target DNA sequence. In this talk I will show how this graph is
constructed and give some examples of its operation. Today fornew-generation sequencing, this Eulerian method is the method of choice.