4.5 Article

MARK-AGE data management: Cleaning, exploration and visualization of data

Journal

MECHANISMS OF AGEING AND DEVELOPMENT
Volume 151, Issue -, Pages 38-44

Publisher

ELSEVIER IRELAND LTD
DOI: 10.1016/j.mad.2015.05.007

Keywords

Data cleaning; Missing data; Batch effects; Outliers; Data visualization

Funding

  1. European Commission [200880]

Ask authors/readers for more resources

Databases are an organized collection of data and necessary to investigate a wide spectrum of research questions. For data evaluation analyzers should be aware of possible data quality problems that can compromise results validity. Therefore data cleaning is an essential part of the data management process, which deals with the identification and correction of errors in order to improve data quality. In our cross-sectional study, biomarkers of ageing, analytical, anthropometric and demographic data from about 3000 volunteers have been collected in the MARK-AGE database. Although several preventive strategies were applied before data entry, errors like miscoding, missing values, batch problems etc., could not be avoided completely. Such errors can result in misleading information and affect the validity of the performed data analysis. Here we present an overview of the methods we applied for dealing with errors in the MARK-AGE database. We especially describe our strategies for the detection of missing values, outliers and batch effects and explain how they can be handled to improve data quality. Finally we report about the tools used for data exploration and data sharing between MARK-AGE collaborators. (C) 2015 Elsevier Ireland Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available