Data allocation and query optimization in large scale distributed databases
AuthorZhou, Zehai, 1962-
AdvisorSheng, Olivia R. Liu
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractDistributed database technology is expected to have a significant impact on data processing in the upcoming years because distributed database systems have many potential advantages over centralized systems for geographically distributed organizations. Data allocation and query optimization are two of the most important aspects of distributed database design. Data allocation involves placing a database and the applications that run against it in the multiple sites of a network. It is a very complex problem consisting of two processes: data fragmentation and fragment allocation. Data fragmentation involves the partitioning of each relation into a group of fragment relations while fragment allocation deals with the distribution of these fragmented relations across the sites of the distributed system. Query optimization includes designing algorithms that analyze and convert queries into a set of data manipulation operations. Both the data allocation and query optimization problems are NP-hard in nature and notoriously difficult to solve. We have attempted to combine the two highly interrelated and interactive decision processes in data allocation by formulating them as integer programs taking into consideration different constraints and under various assumptions. Various solution methods are discussed and a new linearization method is investigated. We next analyze the query optimization problem and reduce it to a join ordering problem. Several heuristics and a genetic algorithm have been developed for solving the join ordering problem. Some computational experiments on these algorithms were conducted and solution qualities compared. The computation experiments show that the suggested linearization method performs clearly and consistently better than a currently widely used method and that heuristics and genetic algorithms are viable methods for solving query optimization problem. It is anticipated that the models and solution methods developed in this study for data allocation and query optimization in distributed database systems may be of practical as well as theoretical use. Nevertheless, much more needs to be done to solve the distributed database design problems in order to achieve its potential benefits. Our models and solution methods can be the starting point for eventual resolution of these complex problems in large scale distributed database systems.
Degree ProgramGraduate College