Domain-independent semantic concept extraction using corpus linguistics, statistics and artificial intelligence techniques
AuthorTolle, Kristin M.
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractFor this dissertation two software applications were developed and three experiments were conducted to evaluate the viability of a unique approach to medical information extraction. The first system, the AZ Noun Phraser, was designed as a concept extraction tool. The second application, ANNEE, is a neural net-based entity extraction (EE) system. These two systems were combined to perform concept extraction and semantic classification specifically for use in medical document retrieval systems. The goal of this research was to create a system that automatically (without human interaction) enabled semantic type assignment, such as gene name and disease, to concepts extracted from unstructured medical text documents. Improving conceptual analysis of search phrases has been shown to improve the precision of information retrieval systems. Enabling this capability in the field of medicine can aid medical researchers, doctors and librarians in locating information, potentially improving healthcare decision-making. Due to the flexibility and non-domain specificity of the implementation, these applications have also been successfully deployed in other text retrieval experimentation for law enforcement (Atabakhsh et al., 2001; Hauck, Atabakhsh, Ongvasith, Gupta, & Chen, 2002), medicine (Tolle & Chen, 2000), query expansion (Leroy, Tolle, & Chen, 2000), web document categorization (Chen, Fan, Chau, & Zeng, 2001), Internet spiders (Chau, Zeng, & Chen, 2001), collaborative agents (Chau, Zeng, Chen, Huang, & Hendriawan, 2002), competitive intelligence (Chen, Chau, & Zeng, 2002), and Internet chat-room data visualization (Zhu & Chen, 2001).
Degree ProgramGraduate College