We are upgrading the repository! A content freeze is in effect until November 22nd, 2024 - no new submissions will be accepted; however, all content already published will remain publicly available. Please reach out to repository@u.library.arizona.edu with your questions, or if you are a UA affiliate who needs to make content available soon. Note that any new user accounts created after September 22, 2024 will need to be recreated by the user in November after our migration is completed.

Show simple item record

dc.contributor.authorKauchak, David
dc.contributor.authorLeroy, Gondy
dc.contributor.authorHogue, Alan
dc.date.accessioned2019-04-30T00:00:31Z
dc.date.available2019-04-30T00:00:31Z
dc.date.issued2017-09
dc.identifier.citationKauchak, D., Leroy, G., & Hogue, A. (2017). Measuring text difficulty using parse‐tree frequency. Journal of the Association for Information Science and Technology, 68(9), 2088-2100.en_US
dc.identifier.issn23301635
dc.identifier.doi10.1002/asi.2017.68.issue-9
dc.identifier.urihttp://hdl.handle.net/10150/632156
dc.description.abstractText simplification often relies on dated, unproven readability formulas. As an alternative and motivated by the success of term familiarity, we test a complementary measure: grammar familiarity. Grammar familiarity is measured as the frequency of the 3rd level sentence parse tree and is useful for evaluating individual sentences. We created a database of 140K unique 3rd level parse structures by parsing and binning all 5.4M sentences in English Wikipedia. We then calculated the grammar frequencies across the corpus and created 11 frequency bins. We evaluate the measure with a user study and corpus analysis. For the user study, we selected 20 sentences randomly from each bin, controlling for sentence length and term frequency, and recruited 30 readers per sentence (N = 6,600) on Amazon Mechanical Turk. We measured actual difficulty (comprehension) using a Cloze test, perceived difficulty using a 5-point Likert scale, and time taken. Sentences with more frequent grammatical structures, even with very different surface presentations, were easier to understand, perceived as easier, and took less time to read. Outcomes from readability formulas correlated with perceived but not with actual difficulty. Our corpus analysis shows how the metric can be used to understand grammar regularity in a broad range of corpora.en_US
dc.description.sponsorshipNational Library of Medicine of the National Institutes of Health [R01LM011975]en_US
dc.language.isoenen_US
dc.publisherWILEYen_US
dc.relation.urlhttp://doi.wiley.com/10.1002/asi.2017.68.issue-9en_US
dc.rights© 2017 ASIS&T.en_US
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.titleMeasuring Text Difficulty Using Parse-Tree Frequencyen_US
dc.typeArticleen_US
dc.contributor.departmentUniv Arizona, Dept Management Informat Syst, Eller Coll Managementen_US
dc.contributor.departmentUniv Arizona, Dept Linguisten_US
dc.identifier.journalJOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYen_US
dc.description.note12 month embargo; published online 20 June 2017en_US
dc.description.collectioninformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.en_US
dc.eprint.versionFinal accepted manuscripten_US
dc.source.journaltitleJournal of the Association for Information Science and Technology
dc.source.volume68
dc.source.issue9
refterms.dateFOA2018-06-20T00:00:00Z


Files in this item

Thumbnail
Name:
2017-parse tree frequency.pdf
Size:
3.531Mb
Format:
PDF
Description:
Final Accepted Manuscript

This item appears in the following Collection(s)

Show simple item record