We are upgrading the repository! A content freeze is in effect until November 22nd, 2024 - no new submissions will be accepted; however, all content already published will remain publicly available. Please reach out to repository@u.library.arizona.edu with your questions, or if you are a UA affiliate who needs to make content available soon. Note that any new user accounts created after September 22, 2024 will need to be recreated by the user in November after our migration is completed.

Show simple item record

dc.contributor.authorKnight, James Robert.
dc.creatorKnight, James Robert.en_US
dc.date.accessioned2011-10-31T18:09:12Z
dc.date.available2011-10-31T18:09:12Z
dc.date.issued1993en_US
dc.identifier.urihttp://hdl.handle.net/10150/186432
dc.description.abstractFinding matches, both exact and approximate, between a sequence of symbols A and a pattern P has long been an active area of research in algorithm design. Some of the more well-known byproducts from that research are the diffprogram and grep family of programs. These problems form a sub-domain of a larger area of problems called discrete pattern matching which has been developed recently to characterize the wide range of pattern matching problems. This dissertation presents new algorithms for discrete pattern matching over sequences and develops a new sub-domain of problems called discrete pattern matching over interval sets. The problems and algorithms presented here are characterized by three common features: (1) a "computable scoring function" which defines the quality of matches; (2) a graph based, dynamic programming framework which captures the structure of the algorithmic solutions; and (3) an interdisciplinary aspect to the research, particularly between computer science and molecular biology, not found in other topics in computer science. The first half of the dissertation considers discrete pattern matching over sequences. It develops the alignment-graph/dynamic-programming framework for the algorithms in the sub-domain and then presents several new algorithms for regular expression and extended regular expression pattern matching. The second half of the dissertation develops the sub-domain of discrete pattern matching over interval sets, also called super-pattern matching. In this sub-domain, the input consists of sets of typed intervals, defined over a finite range, and a pattern expression of the interval types. A match between the interval sets and the pattern consists of a sequence of consecutive intervals, taken from the interval sets, such that their corresponding sequence of types matches the pattern. The name super-pattern matching comes from those problems where the interval sets corresponds to the sets of substrings reported by various pattern matching problems over a common input sequence. The pattern for the super-pattern matching problem, then, represents a "pattern of patterns," or super-pattern, and the sequences of intervals matching the super-pattern correspond to the substring of the original sequence which match that larger "pattern."
dc.language.isoenen_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.subjectComputer science.en_US
dc.titleDiscrete pattern matching over sequences and interval sets.en_US
dc.typetexten_US
dc.typeDissertation-Reproduction (electronic)en_US
dc.contributor.chairMyers, Eugene W.en_US
dc.identifier.oclc702682446en_US
thesis.degree.grantorUniversity of Arizonaen_US
thesis.degree.leveldoctoralen_US
dc.contributor.committeememberDowney, Peter J.en_US
dc.contributor.committeememberKannan, Sampathen_US
dc.identifier.proquest9408506en_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.namePh.D.en_US
dc.description.noteThis item was digitized from a paper original and/or a microfilm copy. If you need higher-resolution images for any content in this item, please contact us at repository@u.library.arizona.edu.
dc.description.admin-noteOriginal file replaced with corrected file October 2023.
refterms.dateFOA2018-08-23T13:00:57Z
html.description.abstractFinding matches, both exact and approximate, between a sequence of symbols A and a pattern P has long been an active area of research in algorithm design. Some of the more well-known byproducts from that research are the diffprogram and grep family of programs. These problems form a sub-domain of a larger area of problems called discrete pattern matching which has been developed recently to characterize the wide range of pattern matching problems. This dissertation presents new algorithms for discrete pattern matching over sequences and develops a new sub-domain of problems called discrete pattern matching over interval sets. The problems and algorithms presented here are characterized by three common features: (1) a "computable scoring function" which defines the quality of matches; (2) a graph based, dynamic programming framework which captures the structure of the algorithmic solutions; and (3) an interdisciplinary aspect to the research, particularly between computer science and molecular biology, not found in other topics in computer science. The first half of the dissertation considers discrete pattern matching over sequences. It develops the alignment-graph/dynamic-programming framework for the algorithms in the sub-domain and then presents several new algorithms for regular expression and extended regular expression pattern matching. The second half of the dissertation develops the sub-domain of discrete pattern matching over interval sets, also called super-pattern matching. In this sub-domain, the input consists of sets of typed intervals, defined over a finite range, and a pattern expression of the interval types. A match between the interval sets and the pattern consists of a sequence of consecutive intervals, taken from the interval sets, such that their corresponding sequence of types matches the pattern. The name super-pattern matching comes from those problems where the interval sets corresponds to the sets of substrings reported by various pattern matching problems over a common input sequence. The pattern for the super-pattern matching problem, then, represents a "pattern of patterns," or super-pattern, and the sequences of intervals matching the super-pattern correspond to the substring of the original sequence which match that larger "pattern."


Files in this item

Thumbnail
Name:
azu_td_9408506_sip1_c.pdf
Size:
6.514Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record