• An exploratory study of human clustering of Web pages

      Khoo, Christopher S.G.; Ng, Karen; Ou, Shiyan; López-Huertas, Marí­a J. (Ergon-Verlag, 2002)
      This study seeks to find out how human beings cluster Web pages naturally. 20 Web pages retrieved by the Northern Light search engine for each of 10 queries were sorted by 3 subjects into categories that were natural or meaningful to them. It was found that different subjects clustered the same set of Web pages quite differently and created different categories. The average inter-subject similarity of the clusters created was a low 0.27. Subjects created an average of 5.4 clusters for each sorting. The categories constructed can be divided into 10 types. About 1/3 of the categories created were topical. Another 20% of the categories relate to the degree of relevance or usefulness. The rest of the categories were subject-independent categories such as format, purpose, authoritativeness and direction to other sources. The authors plan to develop automatic methods for categorizing Web pages using the common categories created by the subjects. It is hoped that the techniques developed can be used by Web search engines to automatically organize Web pages retrieved into categories that are natural to users.