Initial data analysis IDA

Data analysis is a systematic process examining datasets in order to draw valid conclusions about the information they contain, increasingly with the aid of specialized systems and software, leading to discovering useful information to make informed decisions to verify or disapprove some scientific or business models, theories or hypotheses.

As a researcher or laboratory analyst, we must have the drive to obtain quality data in our work. A careful plan in database design and statistical analysis with variable definitions, plausibility checks, data quality checks and ability to identifying likely errors in data and resolving data inconsistencies, etc. has to be established before embarking the full data collection.  More importantly, the plan should not be altered without agreement of the project steering team in order to reduce the extent of data dredging or hypothesis fishing leading to false positive studies.  Shortcomings in initial data analysis may result in adopting inappropriate statistical methods or making incorrect conclusions.

Our first step of initial data analysis is to check consistency and accuracy of the data, such as looking up for any outlying data. This can be visualized through plotting the data against time of data collection or other independent parameters.  This should be done before embarking on more complex analyses.

After having satisfied that the data are reasonably error-free, we should get familiar with the collected data and examine them for any consistency of data formats, number and patterns of missing data, the probability distributions of its continuous variables, etc.  For more advanced initial analysis, decisions have to be made about the way variables are used in further analyses with the aid of data analytics technologies or statistical techniques.  These variables can be studied in their raw form, transformed to some standardized format, categorized or stratified into groups for modeling.

