• Automatic Construction of Networks of Concepts Characterizing Document Databases

      Chen, Hsinchun; Lynch, K.J. (IEEE, 1992)
      The results of a study that involved the creation of knowledge bases of concepts from large, operational textual databases are reported. Two East-bloc computing knowledge bases, both based on a semantic network structure, were created automatically using two statistical algorithms. With the help of four East-bloc computing experts, we evaluated the two knowledge bases in detail in a concept-association experiment based on recall and recognition tests. In the experiment, one of the knowledge bases that exhibited the asymmetric link property out-performed all four experts in recalling relevant concepts in East-bloc computing. The knowledge base, which contained about 20,O00 concepts (nodes) and 280,O00 weighted relationships (links), was incorporated as a thesaurus-like component into an intelligent retrieval system. The system allowed users to perform semantics-based information management and information retrieval via interactive, conceptual relevance feedback.
    • An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English) Documents

      Lin, Chung-hsin; Chen, Hsinchun (IEEE, 1996-02)
      An automatic indexing and concept classification approach to a multilingual (Chinese and English) bibliographic database is presented. We introduced a multi-linear termphrasing technique to extract concept descriptors (terms or keywords) from a Chinese-English bibliographic database. A concept space of related descriptors was then generated using a co-occurrence analysis technique. Like a man-made thesaurus, the system-generated concept space can be used to generate additional semantically-relevant terms for search. For concept classification and clustering, a variant of a Hopfield neural network was developed to cluster similar concept descriptors and to generate a small number of concept groups to represent (summarize) the subject matter of the database. The concept space approach to information classification and retrieval has been adopted by the aupors in other scientific databases and business applications, but multilingual information retrieval presents a unique challenge. This research reports our experiment on multilingual databases. Our system was initially developed in the MS-DOS environment, running ETEN Chinese operating system. For performance reasons, it was then tested on a UNIX-based system. Due to the unique ideographic nature of the Chinese language, a Chinese term-phrase indexing paradigm considering the ideographic characteristics of Chinese was developed as a multilingual information classification model. By applying the neural network based concept classification technique, the model presents a novel way of organizing unstructured multilingual information.
    • Browsing in Hypertext: A Cognitive Study

      Carmel, Erran; Crawford, Stephen; Chen, Hsinchun (IEEE, 1992-09)
      With the growth of hypertext and multimedia applications that support and encourage browsing it is time to take a penetrating look at browsing behavior. Several dimensions of browsing are examined, to find out: first, what is browsing and what cognitive processes are associated with it; second, is there a browsing strategy, and if so, are there any differences between how subject-area experts and novices browse; and finally, how can this knowledge be applied to improve the design of hypertext systems. Two groups of students, subject-area experts and novices, were studied while browsing a Macintosh H y p e r c a r d application on the subject of The Vietnam War. A protocol analysis technique was used to gather and analyze data. Components of the GOMS model were used to describe the goals, operators, methods, and selection rules observed. Three browsing strategies were identified: 1) search-oriented browse, scanning and reviewing information relevant to a fixed task, 2) reviewbrowse, scanning and reviewing interesting information in the presence of transient browse goals that represent changing tasks, and 3) scan-browse, scanning for interesting information (without review). Most subjects primarily used review-browse interspersed with search-oriented browse. Within this strategy, comparisons between subject-area experts and novices revealed differences in tactics: experts browsed in more depth, seldom used referential links, selected different kinds of topics, and viewed information differently than did novices. Based on these findings, suggestions are made to hypertext developers.