Show simple item record

dc.contributor.authorEndara, Lorena*
dc.contributor.authorCui, Hong*
dc.contributor.authorBurleigh, J. Gordon*
dc.date.accessioned2018-05-16T18:06:44Z
dc.date.available2018-05-16T18:06:44Z
dc.date.issued2018-03
dc.identifier.citationEndara, L., H. Cui, and J. G. Burleigh. 2018. Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing. Applications in Plant Sciences 6(3): e1035.en_US
dc.identifier.issn2168-0450
dc.identifier.doi10.1002/aps3.1035
dc.identifier.urihttp://hdl.handle.net/10150/627652
dc.description.abstractPremise of the StudyPhenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi-automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms. Methods and ResultsOur protocol includes the Explorer of Taxon Concepts (ETC), an online application that assembles taxon-by-character matrices from taxonomic descriptions, and MatrixConverter, a Java application that enables users to evaluate and discretize the characters extracted by ETC. We demonstrate this protocol using descriptions from Araucariaceae. ConclusionsThe NLP pipeline unlocks the phenotypic data found in taxonomic descriptions and makes them usable for evolutionary analyses.en_US
dc.description.sponsorshipU.S. National Science Foundation [DEB-1208256, DEB-1541506, DBI-1147266]en_US
dc.language.isoenen_US
dc.publisherBOTANICAL SOC AMER INCen_US
dc.relation.urlhttps://onlinelibrary.wiley.com/doi/full/10.1002/aps3.1035en_US
dc.rights© 2018 Endara et al. Applications in Plant Sciences is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America. This is an open access article under the terms of the Creative Commons Attribution License.en_US
dc.subjectmorphological matricesen_US
dc.subjectnatural language processingen_US
dc.subjectphenotypic traitsen_US
dc.subjecttaxonomic descriptionsen_US
dc.titleExtraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processingen_US
dc.typeArticleen_US
dc.contributor.departmentUniv Arizona, Sch Informat, Tucson, AZ 85719 USAen_US
dc.identifier.journalAPPLICATIONS IN PLANT SCIENCESen_US
dc.description.noteOpen access journal.en_US
dc.description.collectioninformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.en_US
dc.eprint.versionFinal published versionen_US
refterms.dateFOA2018-05-16T18:06:45Z


Files in this item

Thumbnail
Name:
Endara_et_al-2018-Applications ...
Size:
785.4Kb
Format:
PDF
Description:
Final Published Version

This item appears in the following Collection(s)

Show simple item record