• A Collection of Visual Thesauri for Browsing Large Collections of Geographic Images

      Ramsey, Marshall C.; Chen, Hsinchun; Zhu, Bin (John Wiley & Sons, Inc., 1999)
      Digital libraries of geo-spatial multimedia content are currently deficient in providing fuzzy, concept-based retrieval mechanisms to users. The main challenge is that indexing and thesaurus creation are extremely laborintensive processes for text documents and especially for images. Recently, 800,000 declassified satellite photographs were made available by the United States Geological Survey. Additionally, millions of satellite and aerial photographs are archived in national and local map libraries. Such enormous collections make human indexing and thesaurus generation methods impossible to utilize. In this article we propose a scalable method to automatically generate visual thesauri of large collections of geo-spatial media using fuzzy, unsupervised machine-learning techniques.
    • Genescene: Biomedical Text And Data Mining

      Leroy, Gondy; Chen, Hsinchun; Martinez, Jesse D.; Eggers, Shauna; Falsey, Ryan R.; Kislin, Kerri L.; Huang, Zan; Li, Jiexun; Xu, Jie; McDonald, Daniel M.; et al. (Wiley Periodicals, Inc, 2005)
      To access the content of digital texts efficiently, it is necessary to provide more sophisticated access than keyword based searching. Genescene provides biomedical researchers with research findings and background relations automatically extracted from text and experimental data. These provide a more detailed overview of the information available. The extracted relations were evaluated by qualified researchers and are precise. A qualitative ongoing evaluation of the current online interface indicates that this method to search the literature is more useful and efficient than keyword based searching.
    • A Graph Model for E-Commerce Recommender Systems

      Huang, Zan; Chung, Wingyan; Chen, Hsinchun (Wiley Periodicals, Inc, 2004)
      Information overload on the Web has created enormous challenges to customers selecting products for online purchases and to online businesses attempting to identify customersâ preferences efficiently. Various recommender systems employing different data representations and recommendation methods are currently used to address these challenges. In this research, we developed a graph model that provides a generic data representation and can support different recommendation methods. To demonstrate its usefulness and flexibility, we developed three recommendation methods: direct retrieval, association mining, and high-degree association retrieval. We used a data set from an online bookstore as our research test-bed. Evaluation results showed that combining product content information and historical customer transaction information achieved more accurate predictions and relevant recommendations than using only collaborative information. However, comparisons among different methods showed that high-degree association retrieval did not perform significantly better than the association mining method or the direct retrieval method in our test-bed.
    • HelpfulMed: Intelligent Searching for Medical Information over the Internet

      Chen, Hsinchun; Lally, Ann M.; Zhu, Bin; Chau, Michael (Wiley Periodicals, Inc, 2003-05)
      Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be â medically-related.â This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or â concept space,â and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with finegrained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders on a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLSâ systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.
    • Introduction to the JASIST Special Topic Section on Web Retrieval and Mining: A Machine Learning Perspective

      Chen, Hsinchun (Wiley Periodicals, Inc, 2003-05)
      Research in information retrieval (IR) has advanced significantly in the past few decades. Many tasks, such as indexing and text categorization, can be performed automatically with minimal human effort. Machine learning has played an important role in such automation by learning various patterns such as document topics, text structures, and user interests from examples. In recent years, it has become increasingly difficult to search for useful information on the World Wide Web because of its large size and unstructured nature. Useful information and resources are often hidden in the Web. While machine learning has been successfully applied to traditional IR systems, it poses some new challenges to apply these algorithms to the Web due to its large size, link structure, diversity in content and languages, and dynamic nature. On the other hand, such characteristics of the Web also provide interesting patterns and knowledge that do not present in traditional information retrieval systems.
    • MetaSpider: Meta-Searching and Categorization on the Web

      Chen, Hsinchun; Fan, Haiyan; Chau, Michael; Zeng, Daniel (Wiley Periodicals, Inc, 2001)
      It has become increasingly difficult to locate relevant information on the Web, even with the help of Web search engines. Two approaches to addressing the low precision and poor presentation of search results of current search tools are studied: meta-search and document categorization. Meta-search engines improve precision by selecting and integrating search results fromgeneric or domain-specific Web search engines or other resources. Document categorization promises better organization and presentation of retrieved results. This article introduces MetaSpider, a meta-search engine that has real-time indexing and categorizing functions. We report in this paper the major components of MetaSpider and discuss related technical approaches. Initial results of a user evaluation study comparing Meta- Spider, NorthernLight, and MetaCrawler in terms of clustering performance and of time and effort expended show that MetaSpider performed best in precision rate, but disclose no statistically significant differences in recall rate and time requirements. Our experimental study also reveals that MetaSpider exhibited a higher level of automation than the other two systems and facilitated efficient searching by providing the user with an organized, comprehensive view of the retrieved documents.