Show simple item record

dc.contributor.authorTonkin, Emma
dc.contributor.editorFurner, Jonathanen_US
dc.contributor.editorTennis, Joseph T.en_US
dc.date.accessioned2007-04-16T00:00:01Z
dc.date.available2010-06-18T23:27:30Z
dc.date.issued2006en_US
dc.date.submitted2007-04-16en_US
dc.identifier.citationSearching the long tail: Hidden structure in social tagging 2006, 17en_US
dc.identifier.urihttp://hdl.handle.net/10150/105565
dc.description.abstractIn this paper we explore a method of decomposition of compound tags found in social tagging systems and outline several results, including improvement of search indexes, extraction of semantic information, and benefits to usability. Analysis of tagging habits demonstrates that social tagging systems such as del.icio.us and flickr include both formal metadata, such as geotags, and informally created metadata, such as annotations and descriptions. The majority of tags represent informal metadata; that is, they are not structured according to a formal model, nor do they correspond to a formal ontology. Statistical exploration of the main tag corpus demonstrates that such searches use only a subset of the available tags; for example, many tags are composed as ad hoc compounds of terms. In order to improve accuracy of searching across the data contained within these tags, a method must be employed to decompose compounds in such a way that there is a high degree of confidence in the result. An approach to decomposition of English-language compounds, designed for use within a small initial sample tagset, is described. Possible decompositions are identified from a generous wordlist, subject to selective lexicon snipping. In order to identify the most likely, a Bayesian classifier is used across term elements. To compensate for the limited sample set, a word classifier is employed and the results classified using a similar method, resulting in a successful classification rate of 88%, and a false negative rate of only 1%.
dc.format.mimetypeapplication/pdfen_US
dc.language.isoenen_US
dc.publisherdLISTen_US
dc.subjectClassificationen_US
dc.subjectWorld Wide Weben_US
dc.subjectWeb Metricsen_US
dc.subjectQuantitative Researchen_US
dc.subjectKnowledge Structuresen_US
dc.subjectKnowledge Organizationen_US
dc.subject.otherSocial taggingen_US
dc.subject.otherAutomatic classificationen_US
dc.subject.otherTag analysisen_US
dc.titleSearching the long tail: Hidden structure in social taggingen_US
dc.typeConference Paperen_US
refterms.dateFOA2018-08-21T12:49:48Z
html.description.abstractIn this paper we explore a method of decomposition of compound tags found in social tagging systems and outline several results, including improvement of search indexes, extraction of semantic information, and benefits to usability. Analysis of tagging habits demonstrates that social tagging systems such as del.icio.us and flickr include both formal metadata, such as geotags, and informally created metadata, such as annotations and descriptions. The majority of tags represent informal metadata; that is, they are not structured according to a formal model, nor do they correspond to a formal ontology. Statistical exploration of the main tag corpus demonstrates that such searches use only a subset of the available tags; for example, many tags are composed as ad hoc compounds of terms. In order to improve accuracy of searching across the data contained within these tags, a method must be employed to decompose compounds in such a way that there is a high degree of confidence in the result. An approach to decomposition of English-language compounds, designed for use within a small initial sample tagset, is described. Possible decompositions are identified from a generous wordlist, subject to selective lexicon snipping. In order to identify the most likely, a Bayesian classifier is used across term elements. To compensate for the limited sample set, a word classifier is employed and the results classified using a similar method, resulting in a successful classification rate of 88%, and a false negative rate of only 1%.


Files in this item

Thumbnail
Name:
tonkin.pdf
Size:
254.5Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record