Facilitating knowledge discovery by integrating bottom-up and top-down knowledge sources: A text mining approach
AuthorLeroy, Gondy A.
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractThis dissertation aims to discover synergistic combinations of top-down (ontologies), interactive (relevance feedback), and bottom-up (machine learning) knowledge encoding techniques for text mining. The strength of machine learning techniques lies in their coverage and efficiency because they can discover new knowledge without human intervention. The output, however, is often imprecise and irrelevant. Human knowledge, top-down or interactively encoded, may remedy this. The research question addressed is if knowledge discovery can become more precise and relevant with hybrid systems. Three different combinations are evaluated. The first study investigates an ontology, the Unified Medical Language System (UMLS), combined with an automatically created thesaurus to dynamically adjust the thesaurus' output. The augmented thesaurus was added to a medical, meta-search portal as a keyword suggester and compared with the unmodified thesaurus and UMLS. Users preferred the hybrid approach. Thus, the combination of the ontology with the thesaurus was better than the components separately. The second study investigates implicit relevance feedback combined with genetic algorithms designed to adjust user queries for online searching. These were compared with pure relevance feedback algorithms. Users were divided into groups based on their overall performance. The genetic algorithm significantly helped low achievers, but hindered high achievers. Thus, the interactively elicited knowledge from relevance feedback was judged insufficient to guide machine learning for all users. The final study investigates ontologies combined with two natural language processing techniques: a shallow parser and an automatically created thesaurus. Both capture relations between phrases in biomedical text. Qualified researchers found all terms to be precise; however, terms that belonged to ontologies were more relevant. Parser relations were all precise. Thesaurus relations were less precise, but precision improved for relations that had their terms represented in ontologies. Thus, this integration of ontologies with natural language processing provided good results. In general, it was concluded that top-down encoded knowledge could be effectively integrated with bottom-up encoded knowledge for knowledge discovery in text. This is particularly relevant to business fields, which are text and knowledge intensive. In the future, it will be worthwhile to extend the parser and also to test similar hybrid approaches for data mining.
Degree ProgramGraduate College