• A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation

      Roussinov, Dmitri G.; Chen, Hsinchun (1998)
      The rapid proliferation of textual and multimedia online databases, digital libraries, Internet servers, and intranet services has turned researchers' and practitioners' dream of creating an information-rich society into a nightmare of info-gluts. Many researchers believe that turning an info-glut into a useful digital library requires automated techniques for organizing and categorizing large-scale information. This paper presents research in which we sought to develop a scaleable textual classification and categorization system based on the Kohonen's self-organizing feature map (SOM) algorithm. In our paper, we show how self-organization can be used for automatic thesaurus generation. Our proposed data structure and algorithm took advantage of the sparsity of coordinates in the document input vectors and reduced the SOM computational complexity by several order of magnitude. The proposed Scaleable SOM (SSOM) algorithm makes large-scale textual categorization tasks a possibility. Algorithmic intuition and the mathematical foundation of our research are presented in detail. We also describe three benchmarking experiments to examine the algorithm's performance at various scales: classification of electronic meeting comments, Internet homepages, and the Compendex collection.