Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing
| dc.contributor.author | Endara, Lorena | |
| dc.contributor.author | Cui, Hong | |
| dc.contributor.author | Burleigh, J. Gordon | |
| dc.date.accessioned | 2018-05-16T18:06:44Z | |
| dc.date.available | 2018-05-16T18:06:44Z | |
| dc.date.issued | 2018-03 | |
| dc.identifier.citation | Endara, L., H. Cui, and J. G. Burleigh. 2018. Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing. Applications in Plant Sciences 6(3): e1035. | en_US |
| dc.identifier.issn | 2168-0450 | |
| dc.identifier.doi | 10.1002/aps3.1035 | |
| dc.identifier.uri | http://hdl.handle.net/10150/627652 | |
| dc.description.abstract | Premise of the StudyPhenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi-automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms. Methods and ResultsOur protocol includes the Explorer of Taxon Concepts (ETC), an online application that assembles taxon-by-character matrices from taxonomic descriptions, and MatrixConverter, a Java application that enables users to evaluate and discretize the characters extracted by ETC. We demonstrate this protocol using descriptions from Araucariaceae. ConclusionsThe NLP pipeline unlocks the phenotypic data found in taxonomic descriptions and makes them usable for evolutionary analyses. | en_US |
| dc.description.sponsorship | U.S. National Science Foundation [DEB-1208256, DEB-1541506, DBI-1147266] | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | BOTANICAL SOC AMER INC | en_US |
| dc.relation.url | https://onlinelibrary.wiley.com/doi/full/10.1002/aps3.1035 | en_US |
| dc.rights | © 2018 Endara et al. Applications in Plant Sciences is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America. This is an open access article under the terms of the Creative Commons Attribution License. | en_US |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | morphological matrices | en_US |
| dc.subject | natural language processing | en_US |
| dc.subject | phenotypic traits | en_US |
| dc.subject | taxonomic descriptions | en_US |
| dc.title | Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing | en_US |
| dc.type | Article | en_US |
| dc.contributor.department | Univ Arizona, Sch Informat, Tucson, AZ 85719 USA | en_US |
| dc.identifier.journal | APPLICATIONS IN PLANT SCIENCES | en_US |
| dc.description.note | Open access journal. | en_US |
| dc.description.collectioninformation | This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu. | en_US |
| dc.eprint.version | Final published version | en_US |
| refterms.dateFOA | 2018-05-16T18:06:45Z |

