EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain
dc.contributor.author | Lin, C. | |
dc.contributor.author | Miller, T. | |
dc.contributor.author | Dligach, D. | |
dc.contributor.author | Bethard, S. | |
dc.contributor.author | Savova, G. | |
dc.date.accessioned | 2022-03-17T01:56:58Z | |
dc.date.available | 2022-03-17T01:56:58Z | |
dc.date.issued | 2021 | |
dc.identifier.citation | Lin, C., Miller, T., Dligach, D., Bethard, S., & Savova, G. (2021, June). EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain. In Proceedings of the 20th Workshop on Biomedical Language Processing (pp. 191-201). | |
dc.identifier.isbn | 9781954085404 | |
dc.identifier.doi | 10.18653/v1/2021.bionlp-1.21 | |
dc.identifier.uri | http://hdl.handle.net/10150/663579 | |
dc.description.abstract | Transformer-based neural language models have led to breakthroughs for a variety of natural language processing (NLP) tasks. However, most models are pretrained on general domain data. We propose a methodology to produce a model focused on the clinical domain: continued pretraining of a model with a broad representation of biomedical terminology (PubMedBERT) on a clinical corpus along with a novel entity-centric masking strategy to infuse domain knowledge in the learning process. We show that such a model achieves superior results on clinical extraction tasks by comparing our entity-centric masking strategy with classic random masking on three clinical NLP tasks: cross-domain negation detection (Wu et al., 2014), document time relation (DocTimeRel) classification (Lin et al., 2020b), and temporal relation extraction (Wright-Bettner et al., 2020). We also evaluate our models on the PubMedQA(Jin et al., 2019) dataset to measure the models’ performance on a nonentity-centric task in the biomedical domain. The language addressed in this work is English. © 2021 Association for Computational Linguistics | |
dc.language.iso | en | |
dc.publisher | Association for Computational Linguistics (ACL) | |
dc.rights | Copyright © 2021 Association for Computational Linguistics. Licensed on a Creative Commons Attribution 4.0 International License. | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.title | EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain | |
dc.type | Proceedings | |
dc.type | text | |
dc.contributor.department | University of Arizona | |
dc.identifier.journal | Proceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021 | |
dc.description.note | Open access journal | |
dc.description.collectioninformation | This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu. | |
dc.eprint.version | Final published version | |
dc.source.journaltitle | Proceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021 | |
refterms.dateFOA | 2022-03-17T01:56:58Z |