Committee ChairChen, Hsinchun
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractThe amount of non-English information has proliferated rapidly in recent years. The broad diversity of the multilingual content presents a substantial research challenge in the field of knowledge discovery and information retrieval. Therefore there is an increased interest in the development of multilingual systems to support information sharing across languages. The goal of this dissertation is to study how different techniques and algorithms could help in multilingual Internet searching and browsing through a series of case studies.A system development research process was adopted as the methodology in this dissertation. In the first part of the dissertation, I discuss the development of CMedPort, a Chinese medical portal to serve the information seeking needs of Chinese users. A systematic evaluation has been conducted to study the effectiveness and efficiency of CMedPort in assisting human analysis. My experimental results show that CMedPort achieved significant improvement in searching and browsing performance compared to three benchmark regional search engines.The second and third case studies aim to investigate effective and efficient techniques and algorithms that facilitate multilingual Web retrieval. An English-Chinese multilingual Web retrieval system in the business IT domain was developed and evaluated. It was then extended into five languages: English, Chinese, Japanese, German and Spanish. A dictionary-based approach was adopted in query translation. Corpus-based co-occurrence analysis, relevance feedback, and phrasal translation algorithms were used for disambiguation purposes. Evaluation results showed that the system's phrasal translation and co-occurrence disambiguation led to great improvement in performance. The last part of this dissertation studies proper name translation problem. Proper names are often out-of-vocabulary terms and are critical to multilingual Web retrieval. This study proposes a combined Hidden Markov Model and Web mining model to automatically generate proper name translations. The approach was evaluated on two language pairs: English-Arabic and English Chinese. My results are encouraging and show promise for using transliteration techniques to improve multilingual Web retrieval.This dissertation has two main contributions. Firstly, it demonstrated how information retrieval, Web mining and artificial intelligence techniques can be used in a multilingual Web-based context. Secondly, it provided a set of tools that can facilitate users in their multilingual Web searching and browsing activities.
Degree ProgramManagement Information Systems