Show simple item record

dc.contributor.authorLin, C.
dc.contributor.authorMiller, T.
dc.contributor.authorDligach, D.
dc.contributor.authorBethard, S.
dc.contributor.authorSavova, G.
dc.date.accessioned2022-03-17T01:56:58Z
dc.date.available2022-03-17T01:56:58Z
dc.date.issued2021
dc.identifier.citationLin, C., Miller, T., Dligach, D., Bethard, S., & Savova, G. (2021, June). EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain. In Proceedings of the 20th Workshop on Biomedical Language Processing (pp. 191-201).
dc.identifier.isbn9781954085404
dc.identifier.doi10.18653/v1/2021.bionlp-1.21
dc.identifier.urihttp://hdl.handle.net/10150/663579
dc.description.abstractTransformer-based neural language models have led to breakthroughs for a variety of natural language processing (NLP) tasks. However, most models are pretrained on general domain data. We propose a methodology to produce a model focused on the clinical domain: continued pretraining of a model with a broad representation of biomedical terminology (PubMedBERT) on a clinical corpus along with a novel entity-centric masking strategy to infuse domain knowledge in the learning process. We show that such a model achieves superior results on clinical extraction tasks by comparing our entity-centric masking strategy with classic random masking on three clinical NLP tasks: cross-domain negation detection (Wu et al., 2014), document time relation (DocTimeRel) classification (Lin et al., 2020b), and temporal relation extraction (Wright-Bettner et al., 2020). We also evaluate our models on the PubMedQA(Jin et al., 2019) dataset to measure the models’ performance on a nonentity-centric task in the biomedical domain. The language addressed in this work is English. © 2021 Association for Computational Linguistics
dc.language.isoen
dc.publisherAssociation for Computational Linguistics (ACL)
dc.rightsCopyright © 2021 Association for Computational Linguistics. Licensed on a Creative Commons Attribution 4.0 International License.
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleEntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain
dc.typeProceedings
dc.typetext
dc.contributor.departmentUniversity of Arizona
dc.identifier.journalProceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021
dc.description.noteOpen access journal
dc.description.collectioninformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.
dc.eprint.versionFinal published version
dc.source.journaltitleProceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021
refterms.dateFOA2022-03-17T01:56:58Z


Files in this item

Thumbnail
Name:
2021_bionlp_1_21.pdf
Size:
602.2Kb
Format:
PDF
Description:
Final Published Version

This item appears in the following Collection(s)

Show simple item record

Copyright © 2021 Association for Computational Linguistics. Licensed on a Creative Commons Attribution 4.0 International License.
Except where otherwise noted, this item's license is described as Copyright © 2021 Association for Computational Linguistics. Licensed on a Creative Commons Attribution 4.0 International License.