• Login
    View Item 
    •   Home
    • UA Graduate and Undergraduate Research
    • UA Theses and Dissertations
    • Dissertations
    • View Item
    •   Home
    • UA Graduate and Undergraduate Research
    • UA Theses and Dissertations
    • Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of UA Campus RepositoryCommunitiesTitleAuthorsIssue DateSubmit DateSubjectsPublisherJournalThis CollectionTitleAuthorsIssue DateSubmit DateSubjectsPublisherJournal

    My Account

    LoginRegister

    About

    AboutUA Faculty PublicationsUA DissertationsUA Master's ThesesUA Honors ThesesUA PressUA YearbooksUA CatalogsUA Libraries

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Indexing XML Data for Efficient Twig Pattern Matching

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    azu_etd_2344_sip1_m.pdf
    Size:
    1001.Kb
    Format:
    PDF
    Description:
    azu_etd_2344_sip1_m.pdf
    Download
    Author
    Rao, Praveen
    Issue Date
    2007
    Keywords
    Computer Science.
    Advisor
    Moon, Bongki
    Committee Chair
    Moon, Bongki
    
    Metadata
    Show full item record
    Publisher
    The University of Arizona.
    Rights
    Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
    Abstract
    The Extensible Markup Language XML has become the de facto standard for information representation and interchange on the Internet. In this dissertation, I address the problem of indexing and querying XML in two environments, namely, (a) a traditional environment where data is centrally stored and (b) a growingly popular peer-to-peer (P2P) environment. In a traditional enviroment, the index built over XML data is typicallycentralized. On the other hand, due to the distributed nature of the data in a P2P system, the index is also distributed. Due to the different models of storing data in these two environments, I propose two different XML indexing schemes for efficient query processing.In a traditional environment, a core operation is tofind all occurrences of a given query pattern in the database. I propose a new way of indexing XML documents and processing query patterns. Every XML document in the database is transformed into a sequence of labels by Prüfer's method that constructs a one-to-one correspondence between trees and sequences.During query processing, a query pattern is also transformed into its Prüfer sequence. By performing subsequence matching on the set of sequences in the database, and performing a series of refinement phasesthat I have developed, all the occurrences of a query pattern can be found in the database. Furthermore, I show that all correct answers are found without any false dismissals or false alarms. I present the design, implementation, and experimental evaluation of the PRIX system that I have developed for this purpose.Coupled with the growing popularity of P2P systems, XML is commonly used as an underlying data model for P2P applications to handle the heterogeneity of the data and limited expressiveness of queries. Locating relevant data sources across a large number of participating peers is an important challenge. In this environment, the challenge is to quickly test the existence ofa query pattern in XML documents published by usersrather than finding all their occurrences. PRIX finds all occurrences of a query pattern and hence is not the best solution. Moreover, in a P2P environment, a distributed and decentralized index is necessary. Therefore, I propose a distributed indexing scheme for XML documents to quickly test for existence of query patterns based on polynomial signatures. In this scheme,each XML document is mapped into an algebraic signature that captures the structural summary of the document.The participating peers in the network collectively maintain a distributed and hierarchical index over the signatures. By virtue of the signature index, the signatures of documents with similar structural characteristics tend to be stored together at the same peer, and a search for document sources is resolved quickly. I present the design, implementation, and empirical evaluation of the psiX system that I have developed for this purpose. The signature scheme proposed in psiX can be applied to querying heterogeneous XML databases.
    Type
    text
    Electronic Dissertation
    Degree Name
    Ph.D.
    Degree Level
    doctoral
    Degree Program
    Computer Science
    Graduate College
    Degree Grantor
    University of Arizona
    Collections
    Dissertations

    entitlement

     
    The University of Arizona Libraries | 1510 E. University Blvd. | Tucson, AZ 85721-0055
    Tel 520-621-6442 | repository@u.library.arizona.edu
    DSpace software copyright © 2002-2017  DuraSpace
    Quick Guide | Contact Us | Send Feedback
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.