AuthorJain, Mudita, 1968-
AdvisorMyers, Eugene W.
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractDNA molecules are sequences of characters over a four letter alphabet. Determining the text of the DNA sequence contained in human cells is the goal of the Human Genome Project. The structure of a DNA sequence is reconstructed from a set of shorter fragments sampled from it at unknown locations, as it is usually too long to be determined directly. We consider the problem when the the fragments are very long, and each fragment has a fingerprint consisting of the presence of two or three pre-selected, smaller sequences called probes within it. These probes have a unique location along the original DNA sequence. The fingerprints contain false negative and false positive errors, and the fragments may be chimeric. A physical map of a DNA sequence is a reconstruction of the order of the probes and fragments along it. In short, given a collection of fragments, with fingerprints for each fragment taken from a collection of probes, and parameters that bound the rates of false negatives, false positives, and chimeras in the input data, the problem is to find the most likely probe ordering. Physical mapping is NP-complete when the input data contains errors. To construct physical maps we first determine neighbourhoods of probes and clones that are highly likely to be adjacent on the original DNA sequence. We then use a new, versatile integer linear programming formulation of the problem, to derive heuristics for ordering probes within neighbourhoods. This formulation provides a single, uniform representation for diverse data such as end-clone probes and in-situ hybridization, and provides a natural medium for the integration of previously constructed maps with newer data. We also present an ordering heuristic based upon end-clone data. Finally, we connect these local permutations into a larger, more global probe permutation. For this we use heuristics that have at their core previously mapped data. All heuristics are implemented and evaluated by comparing the computed probe orderings to the original probe orderings for simulated data.
Degree ProgramGraduate College