Language- and domain-independent knowledge maps: A statistical phrase indexing approach
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractGlobal economy increases the need for multilingual systems, while each domain has a large repository of knowledge, particularly explicit knowledge usually captured in text. The speed of textual information being produced has exceeded the speed at which a person can process the information, so an automated approach to alleviate the information overload problem is needed. Unlike structured data in databases, unstructured text cannot be readily understood and processed by computers. This dissertation aims to create a language- and domain-independent approach to automatically generating hierarchical knowledge maps that enable the users to browse and understand the concepts hidden in the underlying knowledge sources. A system development research methodology was adopted to build and evaluate prototype systems to study the research questions. In order to process textual knowledge, a statistical phrase indexing algorithm was proposed and applied to the Chinese language. Next, the algorithm was extended to be able to process multiple languages and domains. Lastly, the results of the algorithm was further applied to a case study using the dissertation's proposed automated framework for generating hierarchical knowledge maps in Chinese news collection. This dissertation has two main contributions. First, it demonstrated that an automated approach is effective in creating knowledge maps for users to browse the underlying knowledge. The approach combines statistical phrase extraction algorithm for representing textual knowledge and neural networks for clustering related concepts and visualization. Second, it provided a set of language- and domain-independent tools to extract phrases from a textual knowledge in order to support text mining applications.
Degree ProgramGraduate College