• CI Spider: a tool for competitive intelligence on the Web

      Chen, Hsinchun; Chau, Michael; Zeng, Daniel (Elsevier, 2002)
      Competitive Intelligence (CI) aims to monitor a firm’s external environment for information relevant to its decision-making process. As an excellent information source, the Internet provides significant opportunities for CI professionals as well as the problem of information overload. Internet search engines have been widely used to facilitate information search on the Internet. However, many problems hinder their effective use in CI research. In this paper, we introduce the Competitive Intelligence Spider, or CI Spider, designed to address some of the problems associated with using Internet search engines in the context of competitive intelligence. CI Spider performs real-time collection of Web pages from sites specified by the user and applies indexing and categorization analysis on the documents collected, thus providing the user with an up-to-date, comprehensive view of the Web sites of user interest. In this paper, we report on the design of the CI Spider system and on a user study of CI Spider, which compares CI Spider with two other alternative focused information gathering methods: Lycos search constrained by Internet domain, and manual within-site browsing and searching. Our study indicates that CI Spider has better precision and recall rate than Lycos. CI Spider also outperforms both Lycos and within-site browsing and searching with respect to ease of use. We conclude that there exists strong evidence in support of the potentially significant value of applying the CI Spider approach in CI applications.
    • Design and evaluation of a multi-agent collaborative Web mining system

      Chau, Michael; Zeng, Daniel; Chen, Hsinchun; Huang, Michael; Hendriawan, David (Elsevier, 2003-04)
      Most existing Web search tools work only with individual users and do not help a user benefit from previous search experiences of others. In this paper, we present the Collaborative Spider, a multi-agent system designed to provide post-retrieval analysis and enable across-user collaboration in Web search and mining. This system allows the user to annotate search sessions and share them with other users. We also report a user study designed to evaluate the effectiveness of this system. Our experimental findings show that subjectsâ search performance was degraded, compared to individual search scenarios in which users had no access to previous searches, when they had access to a limited number (e.g., 1 or 2) of earlier search sessions done by other users. However, search performance improved significantly when subjects had access to more search sessions. This indicates that gain from collaboration through collaborative Web searching and analysis does not outweigh the overhead of browsing and comprehending other usersâ past searches until a certain number of shared sessions have been reached. In this paper, we also catalog and analyze several different types of user collaboration behavior observed in the context of Web mining.
    • Document clustering for electronic meetings: an experimental comparison of two techniques

      Roussinov, Dmitri G.; Chen, Hsinchun (Elsevier, 1999-11)
      In this article, we report our implementation and comparison of two text clustering techniques. One is based on Wardâ s clustering and the other on Kohonenâ s Self-organizing Maps. We have evaluated how closely clusters produced by a computer resemble those created by human experts. We have also measured the time that it takes for an expert to â â clean upâ â the automatically produced clusters. The technique based on Wardâ s clustering was found to be more precise. Both techniques have worked equally well in detecting associations between text documents. We used text messages obtained from group brainstorming meetings.
    • Exploring the use of concept spaces to improve medical information retrieval

      Houston, Andrea L.; Chen, Hsinchun; Schatz, Bruce R.; Hubbard, Susan M.; Sewell, Robin R.; Ng, Tobun Dorbin (Elsevier, 2000)
      This research investigated the application of techniques successfully used in previous information retrieval research, to the more challenging area of medical informatics. It was performed on a biomedical document collection testbed, CANCERLIT, provided by the National Cancer Institute (NCI) , which contains information on all types of cancer therapy. The quality or usefulness of terms suggested by three different thesauri, one based on MeSH terms, one based solely on terms from the document collection, and one based on the Unified Medical Language System UMLS Metathesaurus, was explored with the ultimate goal of improving CANCERLIT information search and retrieval. Researchers affiliated with the University of Arizona Cancer Center evaluated lists of related terms suggested by different thesauri for 12 different directed searches in the CANCERLIT testbed. The preliminary results indicated that among the thesauri, there were no statistically significant differences in either term recall or precision. Surprisingly, there was almost no overlap of relevant terms suggested by the different thesauri for a given search. This suggests that recall could be significantly improved by using a combined thesaurus approach.
    • Fighting organized crimes: using shortest-path algorithms to identify associations in criminal networks

      Xu, Jennifer J.; Chen, Hsinchun (Elsevier, 2004)
      Effective and efficient link analysis techniques are needed to help law enforcement and intelligence agencies fight organized crimes such as narcotics violation, terrorism, and kidnapping. In this paper, we propose a link analysis technique that uses shortest-path algorithms, priority-first-search (PFS) and two-tree PFS, to identify the strongest association paths between entities in a criminal network. To evaluate effectiveness, we compared the PFS algorithms with crime investigatorsâ typical association-search approach, as represented by a modified breadth-first-search (BFS). Our domain expert considered the association paths identified by PFS algorithms to be useful about 70% of the time, whereas the modified BFS algorithmâ s precision rates were only 30% for a kidnapping network and 16.7% for a narcotics network. Efficiency of the two-tree PFS was better for a small, dense kidnapping network, and the PFS was better for the large, sparse narcotics network.
    • From Information Retrieval to Knowledge Management Enabling Technologies and Best Practices

      Chen, Hsinchun (Elsevier, 1999-11)
      In this era of the Internet and distributed multimedia computing, new and emerging classes of information technologies have swept into the lives of office workers and everyday people. As technologies and applications become more overwhelming, pressing, and diverse, several well-known information technology problems have become even more urgent. Information overload, a result of the ease of information creation and rendering via Internet and WWW, has become more evident in people’s lives. Significant variations of database formats and structures, the richness of information media text, audio, and video , and an abundance of multilingual information content also have created various information interoperability problems - structural interoperability, media interoperability, and multilingual interoperability.
    • Intelligent internet searching agent based on hybrid simulated annealing

      Yang, Christopher C.; Yen, Jerome; Chen, Hsinchun (Elsevier, 2000)
      The World-Wide Web WWW based Internet services have become a major channel for information delivery. For the same reason, information overload also has become a serious problem to the users of such services. It has been estimated that the amount of information stored on the Internet doubled every 18 months. The speed of increase of homepages can be even faster, some people estimated that it doubled every 6 months. Therefore, a scalable approach to support Internet searching is critical to the success of Internet services and other current or future National Information Infrastructure NII applications. In this paper, we discuss a modified version of simulated annealing algorithm to develop an intelligent personal spider agent, which is based on automatic textual analysis of the Internet documents and hybrid simulated annealing.
    • An intelligent personal spider (agent) for dynamic Internet/Intranet searching

      Chen, Hsinchun; Chung, Yi-Ming; Ramsey, Marshall C.; Yang, Christopher C. (Science Direct, 1998-05)
      As Internet services based on the World-Wide Web become more popular, information overload has become a pressing research problem. Difficulties with search on Internet will worsen as the amount of on-line information increases. A scalable approach to Internet search is critical to the success of Internet services and other current and future National Information Infrastructure (NII) applications. As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent personal spider (agent) approach to Internet searching. The approach, which is grounded on automatic textual analysis and general-purpose search algorithms, is expected to be an improvement over the current static and inefficient Internet searches. In this experiment, we implemented Internet personal spiders based on best first search and genetic algorithm techniques. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the web, based on the links and keyword indexing. A plain, static CGI/HTML-based interface was developed earlier, followed by a recent enhancement of a graphical, dynamic Java-based interface. Preliminary evaluation results and two working prototypes (available for Web access) are presented. Although the examples and evaluations presented are mainly based on Internet applications, the applicability of the proposed techniques to the potentially more rewarding Intranet applications should be obvious. In particular, we believe the proposed agent design can be used to locate organization-wide information, to gather new, time-critical organizational information, and to support team-building and communication in Intranets.
    • Multidimensional scaling for group memory visualization

      McQuaid, Michael J.; Ong, Thian-Huat; Chen, Hsinchun; Nunamaker, Jay F. (Elsevier, 1999-11)
      We describe an attempt to overcome information overload through information visualization â in a particular domain, group memory. A brief review of information visualization is followed by a brief description of our methodology. We . discuss our system, which uses multidimensional scaling MDS to visualize relationships between documents, and which . we tested on 60 subjects, mostly students. We found three important and statistically significant differences between task performance on an MDS-generated display and on a randomly generated display. With some qualifications, we conclude that MDS speeds up and improves the quality of manual classification of documents and that the MDS display agrees with subject perceptions of which documents are similar and should be displayed together.
    • Special Issue Digital Government: technologies and practices

      Chen, Hsinchun (Elsevier, 2002-02)
      The Internet is changing the way we live and do business. It also offers a tremendous opportunity for government to better deliver its contents and services and interact with its many constituentsâ citizens, businesses, and other government partners. In addition to providing information, communication, and transaction services, exciting and innovative transformation could occur with the new technologies and practices.
    • Special issue: "Web retrieval and mining"

      Chen, Hsinchun (Elsevier, 2003-04)
      Search engines and data mining are two research areas that have experienced significant progress over the past few years. Overwhelming acceptance of the Internet as a primary medium for content delivery and business transactions has created unique opportunities and challenges for researchers. The richness of the webâ s multimedia content, the reach and timeliness of web-based publication, the proliferation of e-commerce activities and the potential for wireless web delivery have generated many interesting research problems. Technical, system, organizational and social research approaches are all needed to address these research problems. Many interesting webretrieval and mining research topics have emerged recently. These include, but are not limited to, the following: text and data mining on the web, web visualization, web intelligence and agents, web-based decision support and knowledge management, wireless web retrieval and visualization, web-based usability methodology, web-based analysis for eCommerce applications. This special issue consists of nine papers that report research in web retrieval and mining.
    • Visualization of large category map for Internet browsing

      Yang, Christopher C.; Chen, Hsinchun; Hong, Kay (Elsevier, 2003-04)
      Information overload is a critical problem in World Wide Web. Category map developed based on Kohonenâ s selforganizing map (SOM) has been proven to be a promising browsing tool for the Web. The SOM algorithm automatically categorizes a large Internet information space into manageable sub-spaces. It compresses and transforms a complex information space into a two-dimensional graphical representation. Such graphical representation provides a user-friendly interface for users to explore the automatically generated mental model. However, as the amount of information increases, it is expected to increase the size of the category map accordingly in order to accommodate the important concepts in the information space. It results in increasing of visual load of the category map. Large pool of information is packed closely together on a limited size of displaying window, where local details are difficult to be clearly seen. In this paper, we propose the fisheye views and fractal views to support the visualization of category map. Fisheye views are developed based on the distortion approach while fractal views are developed based on the information reduction approach. The purpose of fisheye views are to enlarge the regions of interest and diminish the regions that are further away while maintaining the global structure. On the other hand, fractal views are an approximation mechanism to abstract complex objects and control the amount of information to be displayed. We have developed a prototype system and conducted a user evaluation to investigate the performance of fisheye views and fractal views. The results show that both fisheye views and fractal views significantly increase the effectiveness of visualizing category map. In addition, fractal views are significantly better than fisheye views but the combination of fractal views and fisheye views do not increase the performance compared to each individual technique.