Show simple item record

dc.contributor.advisorRam, Sudhaen_US
dc.contributor.authorZhao, Huimin
dc.creatorZhao, Huiminen_US
dc.date.accessioned2013-04-11T08:44:12Z
dc.date.available2013-04-11T08:44:12Z
dc.date.issued2002en_US
dc.identifier.urihttp://hdl.handle.net/10150/280014
dc.description.abstractCritical to semantic integration of heterogeneous data sources, determining the semantic correspondences among the data sources is a very complex and resource-consuming task and demands automated support. In this dissertation, we propose a comprehensive approach to detecting both schema-level and instance-level semantic correspondences from heterogeneous data sources. Semantic correspondences on the two levels are identified alternately and incrementally in an iterative procedure. Statistical cluster analysis methods and the Self-Organizing Map (SOM) neural network method are used first to identify similar schema elements (i.e., relations and attributes). Based on the identified schema-level correspondences, classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks are then used to identify matching tuples. Multiple classifiers are combined in various ways, such as bagging, boosting, concatenating, and stacking, to improve classification accuracy. Statistical analysis techniques, such as correlation and regression, are then applied to a preliminary integrated data set to evaluate the relationships among schema elements more accurately. Improved schema-level correspondences are fed back into the identification of instance-level correspondences, resulting in a loop in the overall procedure. Empirical evaluation using real-world and simulated data that has been performed is described to demonstrate the utility of the proposed multi-level, multi-technique approach to detecting semantic correspondences from heterogeneous data sources.
dc.language.isoen_USen_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.subjectBusiness Administration, Management.en_US
dc.subjectInformation Science.en_US
dc.titleCombining schema and instance information for integrating heterogeneous databases: An analytical approach and empirical evaluationen_US
dc.typetexten_US
dc.typeDissertation-Reproduction (electronic)en_US
thesis.degree.grantorUniversity of Arizonaen_US
thesis.degree.leveldoctoralen_US
dc.identifier.proquest3053879en_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineBusiness Administrationen_US
thesis.degree.namePh.D.en_US
dc.identifier.bibrecord.b42812471en_US
refterms.dateFOA2018-09-12T10:26:09Z
html.description.abstractCritical to semantic integration of heterogeneous data sources, determining the semantic correspondences among the data sources is a very complex and resource-consuming task and demands automated support. In this dissertation, we propose a comprehensive approach to detecting both schema-level and instance-level semantic correspondences from heterogeneous data sources. Semantic correspondences on the two levels are identified alternately and incrementally in an iterative procedure. Statistical cluster analysis methods and the Self-Organizing Map (SOM) neural network method are used first to identify similar schema elements (i.e., relations and attributes). Based on the identified schema-level correspondences, classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks are then used to identify matching tuples. Multiple classifiers are combined in various ways, such as bagging, boosting, concatenating, and stacking, to improve classification accuracy. Statistical analysis techniques, such as correlation and regression, are then applied to a preliminary integrated data set to evaluate the relationships among schema elements more accurately. Improved schema-level correspondences are fed back into the identification of instance-level correspondences, resulting in a loop in the overall procedure. Empirical evaluation using real-world and simulated data that has been performed is described to demonstrate the utility of the proposed multi-level, multi-technique approach to detecting semantic correspondences from heterogeneous data sources.


Files in this item

Thumbnail
Name:
azu_td_3053879_sip1_m.pdf
Size:
6.937Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record