Examining the quality of record linkage process using nationwide Brazilian administrative databases to build a large birth cohort
Author
Almeida, DanielaGorender, David
Ichihara, Maria Yury
Sena, Samila
Menezes, Luan
Barbosa, George C. G.
Fiaccone, Rosimeire L.
Paixao, Enny S.
Pita, Robespierre
Barreto, Mauricio L.
Affiliation
Univ Arizona, Dept Comp SciIssue Date
2020-07
Metadata
Show full item recordPublisher
BMCCitation
Almeida, D., Gorender, D., Ichihara, M.Y. et al. Examining the quality of record linkage process using nationwide Brazilian administrative databases to build a large birth cohort. BMC Med Inform Decis Mak 20, 173 (2020).Rights
Copyright © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
Background Research using linked routine population-based data collected for non-research purposes has increased in recent years because they are a rich and detailed source of information. The objective of this study is to present an approach to prepare and link data from administrative sources in a middle-income country, to estimate its quality and to identify potential sources of bias by comparing linked and non-linked individuals. Methods We linked two administrative datasets with data covering the period 2001 to 2015, using maternal attributes (name, age, date of birth, and municipally of residence) from Brazil: live birth information system and the 100 Million Brazilian Cohort (created using administrative records from over 114 million individuals whose families applied for social assistance via the Unified Register for Social Programmes) implementing an in house developed linkage tool CIDACS-RL. We then estimated the proportion of highly probably link and examined the characteristics of missed-matches to identify any potential source of bias. Results A total of 27,699,891 live births were submited to linkage with maternal information recorded in the baseline of the 100 Million Brazilian Cohort dataset of those, 16,447,414 (59.4%) children were found registered in the 100 Million Brazilian Cohort dataset. The proportion of highly probably link ranged from 39.3% in 2001 to 82.1% in 2014. A substantial improvement in the linkage after the introduction of maternal date of birth attribute, in 2011, was observed. Our analyses indicated a slightly higher proportion of missing data among missed matches and a higher proportion of people living in an urban area and self-declared as Caucasian among linked pairs when compared with non-linked sets. Discussion We demonstrated that CIDACS-RL is capable of performing high quality linkage even with a limited number of common attributes, using indexation as a blocking strategy in larg e routine databases from a middle-income country. However, residual records occurred more among people under worse living conditions. The results presented in this study reinforce the need of evaluating linkage quality and when necessary to take linkage error into account for the analyses of any generated dataset.Note
Open access journalISSN
1472-6947EISSN
1472-6947PubMed ID
32711532Version
Final published versionae974a485f413a2113503eed53cd6c53
10.1186/s12911-020-01192-0
Scopus Count
Collections
Except where otherwise noted, this item's license is described as Copyright © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Related articles
- Validating linkage of multiple population-based administrative databases in Brazil.
- Authors: Paixão ES, Campbell OMR, Rodrigues LC, Teixeira MG, Costa MDCN, Brickley EB, Harron K
- Issue date: 2019
- Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil.
- Authors: Paixão ES, Harron K, Andrade K, Teixeira MG, Fiaccone RL, Costa MDCN, Rodrigues LC
- Issue date: 2017 Jul 17
- Biases arising from linked administrative data for epidemiological research: a conceptual framework from registration to analyses.
- Authors: Shaw RJ, Harron KL, Pescarini JM, Pinto Junior EP, Allik M, Siroky AN, Campbell D, Dundas R, Ichihara MY, Leyland AH, Barreto ML, Katikireddi SV
- Issue date: 2022 Dec
- On the Accuracy and Scalability of Probabilistic Data Linkage Over the Brazilian 114 Million Cohort.
- Authors: Pita R, Pinto C, Sena S, Fiaccone R, Amorim L, Reis S, Barreto ML, Denaxas S, Barreto ME
- Issue date: 2018 Mar