Measuring the Quality of Data Collection in a Large Observational Cohort of HIV and AIDS

Mariska Hillebregt*, 1, Elly de Lange-de Klerk2, Dirk Knol2, Frank de Wolf1, 3, Colette Smit1
1 HIV Monitoring Foundation, Amsterdam, The Netherlands
2 Department of Epidemiology and Biostatistics, VU University Medical Center Amsterdam, The Netherlands
3 Imperial College, London, UK

Article Metrics

CrossRef Citations:
Total Statistics:

Full-Text HTML Views: 3096
Abstract HTML Views: 1936
PDF Downloads: 667
Total Views/Downloads: 5699
Unique Statistics:

Full-Text HTML Views: 1474
Abstract HTML Views: 1072
PDF Downloads: 490
Total Views/Downloads: 3036

Creative Commons License
© Hillebregt et al.; Licensee Bentham Open.

open-access license: This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.

* Address correspondence to this author at the HIV Monitoring Foundation, Academic Medical Center University of Amsterdam, HVA-A.3.10, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands; Tel: 020-56 66473; Fax: 020-56 69189; E-mail:


The aim of this study was to examine the quality of data collection by studying the validity of collected data. Data were extracted from the clinic charts of two anonymous outpatients by 38 data collectors. A standard for the data to be collected was determined (168 items). The validity was measured by comparing the collected items with the standard; in this way, the percentages of the collected items that were ‘correct’ could be calculated. The percentage ‘correct’ was higher for clinic chart 1 (mean: 83% correct, SD 7%) than for clinic chart 2 (mean: 78% correct, SD 8%). All categories contained incorrectly collected data. These data were divided into missing data, incorrect start-stop dates, and surplus collected data. Almost all start-stop dates would change into ‘correct’ if ‘monthyear’ was considered correct (instead of the standard ‘daymonthyear’). Not all data collectors used specific protocols, and sources other than the written comments were not always checked. This study shows that a high proportion of data was correctly collected. However, the collection of start-stop dates was not optimal, and the collected data included surplus and missing data. Data collectors should be more knowledgeable about HIV disease and trained in the use of difficult protocols, so that they can better recognize what data to collect and how it should be collected. Among physicians, there should be more agreement about what information to record in the charts, to facilitate data extraction for data collectors.

Keywords: Database, manual data entry, quality of data collection, HIV/AIDS..