Data on health and demographics in India is plagued by incomplete information, overestimation, and under- and over-reporting that lead to hindrance in policy planning, the Indian Council of Medical Research (ICMR) has pointed out.
To fetch quality data in upcoming health studies and surveys such as National Family Health Survey (NFHS), the National Data Quality Forum (NDQF), formulated by ICMR’s National Institute for Medical Statistics (ICMR – NIMS), in partnership with Population Council has identified gaps in data compilation and offered data quality solutions.
The NDQF attempted to identify issues in data quality. It found lack of comparability and poor usability of national level data sources, discordance between system and survey level estimates, increased questionnaire length and questions on socially restricted conversation topics that translate to poor data quality. The NDQF also identified age-reporting errors or non-response and intentional skipping of questions, underreporting due to subjective question interpretation and incompleteness and paucity of data to generate reliable estimates on mortality as major barriers to quality data.
According to ICMR documents on data quality, different public data sources report divergent numbers for the same indicator, for example, Sample Registration Survey (SRS) 2016 and NFHS conducted in 2015-16 report different sex ratios at birth and infant mortality rates, creating an interpretative dilemma. Incomplete information has been a challenge, for example, NSSO (2014) says that “Data that is generated at state level lacks any information on private sector where about 70% of population seek treatment.”
“With NDQF, we are looking at improvement of data quality in general, and for health and medical research in particular,” said Balram Bhargava, director general at ICMR.
India has a rich resource of data on its population, health status and demographic behaviour and economic condition among many other aspects of life and environment. Data is translated into insights and, eventually, into policy through a layered process involving human and technological inputs at every stage. However, these data often suffer from some common challenges related to human and technological factors and affect its quality.
The NDQF found data discordance at multiple levels. First is the entry of beneficiary information and provision of services in registers at sub-center level (first level of discordance). Second is the over reporting that potentially happens at the time of reporting to higher level facilities (second level of discordance). The NDQF also found age-reporting errors or non-response and intentional skipping of questions during large scale surveys. Similarly, incompleteness and unavailability of disaggregated data on the cause of mortality affects the generation of estimates leading to lack of evidence for setting priorities in the healthcare sector.
A study conducted on the vital registration data in 2011 reported that registration data were inadequate for a robust estimate of mortality at the national level. A medically certified cause of death was only recorded for 965,992 (16.8%) of the 5,735,082 deaths registered, the NDQF said.
The NDQF aims at establishing protocols and good practices when dealing with data collection, storage, use and dissemination that can be applied to health and demographic data, as well as replicated across industries and sectors. The NDQF aims to do brainstorming, piloting and employ advanced modeling techniques in artificial intelligence (AI), machine learning and big data analytics along with using technology-based solutions to improve data quality.
“Higher investment in regularly monitored technological support during data collection and more clarity during training of investigators might help solve some anthropometric data quality issues in the large scale surveys,” said Kajori Banerjee, senior research fellow, International Institute for Population Sciences (IIPS), Mumbai. – Live Mint