Show simple item record

dc.contributor.advisorMyers, Eugeneen_US
dc.contributor.authorAnson, Eric Lance
dc.creatorAnson, Eric Lanceen_US
dc.date.accessioned2013-05-09T09:30:54Zen
dc.date.available2013-05-09T09:30:54Zen
dc.date.issued2000en_US
dc.identifier.urihttp://hdl.handle.net/10150/289091en
dc.description.abstractA monumental achievement in the history of science, the sequencing of the entire human genome, will soon be reached. The Human Genome Project (HGP) has been working toward this goal since 1990 using a two-tiered strategy. Recently it was proposed that using a whole-genome shotgun approach to sequence the genome would be faster and less costly. This thesis expands on that proposal by presenting two algorithms that can be used in whole-genome shotgun sequencing. These algorithms were implemented and tested on simulated data. Essential to this approach is the availability of pairs of short, unique sequence markers at a roughly estimated distance from each other. Determining the sequence of the genome can then be broken into a series of inter-marker assembly problems that determine the sequence between a pair of markers. Unfortunately, marker pairs are not always correct and repeats can greatly confound the assembly. This motivates the first problem of rapidly finding a set of linked contigs, called a scaffold, between a pair of markers that confirms the marker pair and the ability to traverse the region between them. Then an inter-marker assembly algorithm that determines the unique sequence segments between a marker pair is presented. Both algorithms are evaluated with respect to a simulation that can model various types of repeats and for which our only information about the presence of repeats is excessive coverage and the ability to detect their boundaries. Simulation results show that at 10x coverage one can find and assemble the unique sequence between markers more than 99.9% of the time for many of the repeat models. Events in this field have been moving rapidly. Recently a new company called Celera Genomics announced its intention to sequence the human genome before the HGP by using the whole-genome shotgun approach. We end this thesis by briefly discussing Celera's approach, and relating it to the algorithms presented here.
dc.language.isoen_USen_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.subjectBiology, Biostatistics.en_US
dc.subjectComputer Science.en_US
dc.titleAlgorithms for whole genome shotgun sequencingen_US
dc.typetexten_US
dc.typeDissertation-Reproduction (electronic)en_US
thesis.degree.grantorUniversity of Arizonaen_US
thesis.degree.leveldoctoralen_US
dc.identifier.proquest9965855en_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.namePh.D.en_US
dc.identifier.bibrecord.b40376515en_US
refterms.dateFOA2018-04-24T17:36:20Z
html.description.abstractA monumental achievement in the history of science, the sequencing of the entire human genome, will soon be reached. The Human Genome Project (HGP) has been working toward this goal since 1990 using a two-tiered strategy. Recently it was proposed that using a whole-genome shotgun approach to sequence the genome would be faster and less costly. This thesis expands on that proposal by presenting two algorithms that can be used in whole-genome shotgun sequencing. These algorithms were implemented and tested on simulated data. Essential to this approach is the availability of pairs of short, unique sequence markers at a roughly estimated distance from each other. Determining the sequence of the genome can then be broken into a series of inter-marker assembly problems that determine the sequence between a pair of markers. Unfortunately, marker pairs are not always correct and repeats can greatly confound the assembly. This motivates the first problem of rapidly finding a set of linked contigs, called a scaffold, between a pair of markers that confirms the marker pair and the ability to traverse the region between them. Then an inter-marker assembly algorithm that determines the unique sequence segments between a marker pair is presented. Both algorithms are evaluated with respect to a simulation that can model various types of repeats and for which our only information about the presence of repeats is excessive coverage and the ability to detect their boundaries. Simulation results show that at 10x coverage one can find and assemble the unique sequence between markers more than 99.9% of the time for many of the repeat models. Events in this field have been moving rapidly. Recently a new company called Celera Genomics announced its intention to sequence the human genome before the HGP by using the whole-genome shotgun approach. We end this thesis by briefly discussing Celera's approach, and relating it to the algorithms presented here.


Files in this item

Thumbnail
Name:
azu_td_9965855_sip1_w.pdf
Size:
2.940Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record