• An intelligent personal spider (agent) for dynamic Internet/Intranet searching

      Chen, Hsinchun; Chung, Yi-Ming; Ramsey, Marshall C.; Yang, Christopher C. (Science Direct, 1998-05)
      As Internet services based on the World-Wide Web become more popular, information overload has become a pressing research problem. Difficulties with search on Internet will worsen as the amount of on-line information increases. A scalable approach to Internet search is critical to the success of Internet services and other current and future National Information Infrastructure (NII) applications. As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent personal spider (agent) approach to Internet searching. The approach, which is grounded on automatic textual analysis and general-purpose search algorithms, is expected to be an improvement over the current static and inefficient Internet searches. In this experiment, we implemented Internet personal spiders based on best first search and genetic algorithm techniques. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the web, based on the links and keyword indexing. A plain, static CGI/HTML-based interface was developed earlier, followed by a recent enhancement of a graphical, dynamic Java-based interface. Preliminary evaluation results and two working prototypes (available for Web access) are presented. Although the examples and evaluations presented are mainly based on Internet applications, the applicability of the proposed techniques to the potentially more rewarding Intranet applications should be obvious. In particular, we believe the proposed agent design can be used to locate organization-wide information, to gather new, time-critical organizational information, and to support team-building and communication in Intranets.
    • Introduction to the JASIST Special Topic Section on Web Retrieval and Mining: A Machine Learning Perspective

      Chen, Hsinchun (Wiley Periodicals, Inc, 2003-05)
      Research in information retrieval (IR) has advanced significantly in the past few decades. Many tasks, such as indexing and text categorization, can be performed automatically with minimal human effort. Machine learning has played an important role in such automation by learning various patterns such as document topics, text structures, and user interests from examples. In recent years, it has become increasingly difficult to search for useful information on the World Wide Web because of its large size and unstructured nature. Useful information and resources are often hidden in the Web. While machine learning has been successfully applied to traditional IR systems, it poses some new challenges to apply these algorithms to the Web due to its large size, link structure, diversity in content and languages, and dynamic nature. On the other hand, such characteristics of the Web also provide interesting patterns and knowledge that do not present in traditional information retrieval systems.