• Federated Search of Scientific Literatures: A Retrospective on the Illinios Digital Library Project

      Schatz, Bruce R.; Mischo, William; Cole, Timothy; Bishop, Ann Peterson; Harum, Susan; Johnson, Eric H.; Neumann, Laura; Chen, Hsinchun; Ng, Tobun Dorbin; Harum, S.; et al. (UIUC, 2000)
      The NSF/DARPA/NASA Digital Libraries Initiative (DLI) project at the University of Illinois at Urbana-Champaign (UIUC), 1994-1998, had the goal of developing widely usable Web technology to effectively search technical documents on the Internet. The DLI testbed focused on using the document structure to provide federated searches across publisher collections. Our sociology research included the evaluation of its effectiveness under use by over 1,000 UIUC faculty and students, a user community an order of magnitude bigger than the last generation of research projects centered on searching scientific literature. Our technology research developed indexing of the contents of text documents to enable a federated search across multiple sources, testing this on millions of documents for semantic federation. This article will discuss the achievements and difficulties we experienced over the past four years.
    • Semantic Issues for Digital Libraries

      Chen, Hsinchun; Harum, S.; Twindale, M. (UIUC, 2000)
      As new and emerging classes of information systems applications the applications become more overwhelming, pressing, and diverse, several well-known information retrieval (IR) problems have become even more urgent in this “network-centric” information age. Information overload, a result of the ease of information creation and rendering via the Internet and the World Wide Web, has become more evident in people’s lives. Significant variations of database formats and structures, the richness of information media, and an abundance of multilingual information content also have created severe information interoperability problems-structural interoperability, media interoperability, and multilingual interoperability. The conventional approaches to addressing information overload and information interoperability problems are manual in nature, requiring human experts as information intermediaries to create knowledge structures and/or ontologies. As information content and collections become even larger and more dynamic, we believe a systemaided bottom-up artificial intelligence (AI) approach is needed. By applying scalable techniques developed in various AI subareas such as image segmentation and indexing, voice recognition, natural language processing, neural networks, machine learning, clustering and categorization, and intelligent agents, we can provide an alternative system-aided approach to addressing both information overload and information interoperability.