Show simple item record

dc.contributor.advisorMyers, Eugene W.en_US
dc.contributor.authorJain, Mudita, 1968-
dc.creatorJain, Mudita, 1968-en_US
dc.date.accessioned2013-05-09T11:32:48Z
dc.date.available2013-05-09T11:32:48Z
dc.date.issued1996en_US
dc.identifier.urihttp://hdl.handle.net/10150/290622
dc.description.abstractDNA molecules are sequences of characters over a four letter alphabet. Determining the text of the DNA sequence contained in human cells is the goal of the Human Genome Project. The structure of a DNA sequence is reconstructed from a set of shorter fragments sampled from it at unknown locations, as it is usually too long to be determined directly. We consider the problem when the the fragments are very long, and each fragment has a fingerprint consisting of the presence of two or three pre-selected, smaller sequences called probes within it. These probes have a unique location along the original DNA sequence. The fingerprints contain false negative and false positive errors, and the fragments may be chimeric. A physical map of a DNA sequence is a reconstruction of the order of the probes and fragments along it. In short, given a collection of fragments, with fingerprints for each fragment taken from a collection of probes, and parameters that bound the rates of false negatives, false positives, and chimeras in the input data, the problem is to find the most likely probe ordering. Physical mapping is NP-complete when the input data contains errors. To construct physical maps we first determine neighbourhoods of probes and clones that are highly likely to be adjacent on the original DNA sequence. We then use a new, versatile integer linear programming formulation of the problem, to derive heuristics for ordering probes within neighbourhoods. This formulation provides a single, uniform representation for diverse data such as end-clone probes and in-situ hybridization, and provides a natural medium for the integration of previously constructed maps with newer data. We also present an ordering heuristic based upon end-clone data. Finally, we connect these local permutations into a larger, more global probe permutation. For this we use heuristics that have at their core previously mapped data. All heuristics are implemented and evaluated by comparing the computed probe orderings to the original probe orderings for simulated data.
dc.language.isoen_USen_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.subjectBiology, Molecular.en_US
dc.subjectBiology, Genetics.en_US
dc.subjectComputer Science.en_US
dc.titleAlgorithms for physical mapping using unique probesen_US
dc.typetexten_US
dc.typeDissertation-Reproduction (electronic)en_US
thesis.degree.grantorUniversity of Arizonaen_US
thesis.degree.leveldoctoralen_US
dc.identifier.proquest9713398en_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.namePh.D.en_US
dc.description.noteThis item was digitized from a paper original and/or a microfilm copy. If you need higher-resolution images for any content in this item, please contact us at repository@u.library.arizona.edu.
dc.identifier.bibrecord.b34396287en_US
dc.description.admin-noteOriginal file replaced with corrected file October 2023.
refterms.dateFOA2018-08-29T20:46:27Z
html.description.abstractDNA molecules are sequences of characters over a four letter alphabet. Determining the text of the DNA sequence contained in human cells is the goal of the Human Genome Project. The structure of a DNA sequence is reconstructed from a set of shorter fragments sampled from it at unknown locations, as it is usually too long to be determined directly. We consider the problem when the the fragments are very long, and each fragment has a fingerprint consisting of the presence of two or three pre-selected, smaller sequences called probes within it. These probes have a unique location along the original DNA sequence. The fingerprints contain false negative and false positive errors, and the fragments may be chimeric. A physical map of a DNA sequence is a reconstruction of the order of the probes and fragments along it. In short, given a collection of fragments, with fingerprints for each fragment taken from a collection of probes, and parameters that bound the rates of false negatives, false positives, and chimeras in the input data, the problem is to find the most likely probe ordering. Physical mapping is NP-complete when the input data contains errors. To construct physical maps we first determine neighbourhoods of probes and clones that are highly likely to be adjacent on the original DNA sequence. We then use a new, versatile integer linear programming formulation of the problem, to derive heuristics for ordering probes within neighbourhoods. This formulation provides a single, uniform representation for diverse data such as end-clone probes and in-situ hybridization, and provides a natural medium for the integration of previously constructed maps with newer data. We also present an ordering heuristic based upon end-clone data. Finally, we connect these local permutations into a larger, more global probe permutation. For this we use heuristics that have at their core previously mapped data. All heuristics are implemented and evaluated by comparing the computed probe orderings to the original probe orderings for simulated data.


Files in this item

Thumbnail
Name:
azu_td_9713398_sip1_c.pdf
Size:
3.193Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record