/ Published in Blog

A Costly Assumption: What Every BDHS Researcher Should Know Before Combining Survey Waves

[PDF Version]

Written by Salina Siddiqua, Dr. S M Abdullah, Dr. Rumana Huque

The Growing Importance of BDHS in Bangladesh Research

The Bangladesh Demographic and Health Survey (BDHS) has become one of the most influential sources of evidence for health and social science research in Bangladesh. Over the past two decades, BDHS data have been extensively used to study fertility, maternal and child health, nutrition, women’s empowerment, non-communicable diseases, healthcare utilisation, and socioeconomic inequalities. The survey’s nationally representative design, large sample size, rigorous sampling methodology, and public availability have made it an indispensable resource for researchers, policymakers, and development partners. Recent studies have increasingly relied on multiple BDHS waves to examine changes in health outcomes and population characteristics over time, contributing substantially to the evidence base underpinning health policy and programme development in Bangladesh (1–3).

However, despite its widespread use, an important methodological misunderstanding continues to appear in academic research and student dissertations. Many researchers implicitly assume that clusters appearing across different BDHS survey rounds represent the same geographic locations or populations over time. This assumption often arises when researchers observe identical cluster numbers in multiple survey waves and interpret them as repeated observations of the same communities. While understandable, this interpretation is generally incorrect and can have important implications for research design, statistical analysis, and the validity of policy conclusions.

This issue is not merely theoretical. During doctoral research at the University of York, UK one of the authors encountered precisely this challenge while attempting to analyse geographic patterns across multiple BDHS waves. What initially appeared to be a straightforward exercise in linking clusters over time became a months-long investigation into the structure of the BDHS sampling design. Like many researchers, it seemed reasonable to assume that clusters sharing the same numerical identifiers across survey rounds referred to the same locations. However, a detailed examination of survey documentation, sampling procedures, and geospatial information revealed a different reality. Although cluster numbers often reappear across survey waves, the underlying Enumeration Areas (EAs), households, and respondents are typically different. The experience ultimately required substantial revisions to the analytical approach and highlighted a methodological issue that deserves greater attention within Bangladesh’s research community.

Understanding What BDHS Was Designed to Measure

Understanding this issue requires first understanding what the BDHS was designed to do. The BDHS forms part of the global Demographic and Health Surveys Programme, which has collected standardised health and demographic data across more than 90 low- and middle-income countries. The primary objective of the programme is to generate nationally representative snapshots of population health at specific points in time using scientifically robust sampling procedures that allow comparisons across countries and survey rounds (4). In Bangladesh, each survey round is designed to provide an accurate representation of the country’s population at the time of data collection rather than to follow the same households or communities over time.

This distinction is critical because repeated surveys do not automatically constitute panel data. In a true panel dataset, the same individuals, households, firms, schools, or communities are observed repeatedly over multiple periods. Such datasets enable researchers to examine within-unit changes over time and are particularly valuable for identifying causal relationships and behavioural dynamics. As Wooldridge (2010) notes, the principal advantage of panel data lies in the ability to control for unobserved characteristics that remain constant over time, thereby improving causal inference (5). In contrast, repeated cross-sectional surveys collect data from different samples drawn from the same population at different points in time. Although repeated cross-sections can effectively capture population-level trends, they do not provide direct information about how specific individuals, households, or communities change over time.

The BDHS belongs firmly within the latter category. Each survey round draws a new sample from the national census sampling frame, selecting fresh Enumeration Areas and households through a multistage sampling design (6). Consequently, while the surveys remain nationally representative and broadly comparable across time, they are not designed to track the same respondents or communities longitudinally (7). Researchers can therefore examine whether national rates of stunting, contraceptive use, hypertension, or institutional delivery have changed over time, but they cannot generally determine whether the same households or communities experienced those changes.

When Cluster Numbers Create Confusion

Much of the confusion arises from the presence of cluster identifiers within the datasets. Researchers frequently observe that cluster numbers begin at one in each survey round and continue sequentially throughout the sample. It is therefore tempting to assume that Cluster 1 in the 2017–18 BDHS refers to the same geographic location as Cluster 1 in the 2022 BDHS. However, cluster numbers function primarily as administrative identifiers within individual survey rounds rather than as permanent geographic identifiers across surveys. Because each survey involves a fresh sampling exercise, similarly numbered clusters across waves do not necessarily correspond to the same Enumeration Areas or populations. In practice, the same cluster number may represent entirely different locations in different survey years.

The challenge becomes even more complicated when researchers attempt to use geospatial information to establish continuity across survey waves. The DHS Programme makes geographic coordinates available for research purposes, which has greatly expanded opportunities for spatial analysis. However, to protect respondent confidentiality, the programme intentionally displaces GPS coordinates before public release. Urban clusters may be displaced by up to two kilometres, while rural clusters may be displaced by up to five kilometres, with a small proportion displaced by as much as ten kilometres (8). Although this practice is essential for protecting privacy, it further complicates efforts to match clusters precisely across survey rounds and reinforces the importance of caution when interpreting geographic continuity.

Why the Distinction Matters for Research

The implications of misclassifying repeated cross-sectional data as panel data are potentially substantial. At the most basic level, researchers may select statistical methods that are inconsistent with the underlying data structure. Fixed-effects models, random-effects models, and many longitudinal analytical approaches assume repeated observations of the same units over time. Applying such methods to independently sampled cross-sectional data may produce estimates that appear sophisticated but rest on invalid assumptions. More concerningly, researchers may overstate causal interpretations of observed changes, mistakenly attributing differences between survey rounds to changes within communities when the observed differences may instead reflect variation between independently selected samples.

These concerns are becoming increasingly relevant as advanced causal inference methods gain popularity within public health and social science research in Bangladesh. Approaches such as difference-in-differences, synthetic control methods, synthetic difference-in-differences, and staggered treatment effect estimators have transformed policy evaluation in recent years (9,10). However, the credibility of these methods depends fundamentally on whether their assumptions align with the underlying data structure. Sophisticated statistical techniques cannot compensate for incorrect assumptions about how the data were collected. As the old principle in epidemiology and econometrics reminds us, better methods cannot rescue fundamentally inappropriate data.

Importantly, recognising these limitations does not diminish the value of the BDHS. On the contrary, the BDHS remains one of the highest-quality datasets available in Bangladesh and continues to provide unparalleled opportunities for examining population-level trends and inequalities. Researchers can legitimately use multiple survey rounds to investigate changes in national indicators, analyse temporal trends, conduct decomposition analyses, estimate multilevel models, and explore associations between social determinants and health outcomes. The key issue is not whether BDHS data can be used across waves, but whether the analytical approach appropriately reflects the survey design.

Methodological Rigor Before Methodological Sophistication

As access to large public datasets continues to expand, methodological literacy becomes increasingly important. Universities, research institutes, development organisations, and research supervisors all have a role to play in strengthening understanding of survey design, sampling theory, and causal inference. Statistical software has become remarkably powerful and accessible, but methodological understanding remains essential for producing credible evidence. Before selecting a statistical model, researchers must first understand how the data were generated. In many cases, this seemingly simple step determines the validity of the entire analysis.

The BDHS has transformed health research in Bangladesh and will continue to do so for years to come. However, maximising its value requires a clear understanding of both its strengths and its limitations. The survey was designed as a repeated cross-sectional study, not as a longitudinal panel. Cluster numbers that appear identical across survey rounds do not necessarily represent the same communities, households, or populations. Failing to recognise this distinction can lead to inappropriate analytical choices and potentially misleading conclusions. As Bangladesh’s research community increasingly embraces advanced quantitative methods, methodological rigor must keep pace with analytical ambition. Ultimately, high-quality evidence begins not with sophisticated statistical techniques, but with a correct understanding of the data itself.

Authors:

1. Salina Siddiqua, Postgraduate Researcher, University of York, UK, and Associate Professor (on Study Leave), Department of Development Studies, University of Dhaka

2. Dr. S M Abdullah, Honorary Deputy Director (Research), ARK Foundation, Dhaka, Bangladesh, and Associate Professor, Department of Economics, University of Dhaka, Bangladesh

3. Dr. Rumana Huque, Executive Director, ARK Foundation, Dhaka, Bangladesh, and Professor, Department of Economics, University of Dhaka, Bangladesh

References

Akter T, Hassan R, Hossain MS, Saha S. Women’s empowerment and health: nationwide insights on selected non-communicable conditions in Bangladesh. BMC Public Health [Internet]. 2026 Mar 24;26(1). Available from: http://dx.doi.org/10.1186/s12889-026-27084-y
Miah MM, Aktar F, Hossain MS, Hossain K, Begum N. Influence of socioeconomic factors on maternal and child health outcomes in Bangladesh: evidence from the 2022 demographic and health survey. BMC Pediatr [Internet]. 2026 Feb 10;26(1). Available from: http://dx.doi.org/10.1186/s12887-026-06561-8
Prantik KH, Nabi AT, Suhel GM, Afroz M, Hasan MN. Maternal healthcare service utilization in Bangladesh: A cross‐sectional study of determinants and temporal trends using BDHS 2011–2022. Health Sci Rep [Internet]. 2026 Apr;9(4). Available from: http://dx.doi.org/10.1002/hsr2.72171
The DHS Program [Internet]. USAID; 2024 [cited 2025 May]. Available from: https://dhsprogram.com/data/available-datasets.cfm
Wooldridge JM. Econometric Analysis of Cross Section and Panel Data. 2nd ed. London, England: MIT Press; 2010.
BDHS final report, 2022 [Internet]. The DHS program; 2024. Available from: https://dhsprogram.com/pubs/pdf/FR386/FR386.pdf
Croft, Trevor N., Allen, Courtney K., Zachary, Blake W. Guide to DHS Statistics [Internet]. Rockville, Maryland, USA: ICF; 2023. Available from: https://www.dhsprogram.com/pubs/pdf/DHSG1/Guide_to_DHS_Statistics_DHS-8.pdf
Burgert, Clara R., Josh Colston, Thea Roy, and Blake Zachary. Geographic displacement procedure and georeferenced data release policy for the Demographic and Health Surveys. DHS Spatial Analysis Reports No. 7 [Internet]. Calverton, Maryland, USA: ICF International; 2013. Available from: https://dhsprogram.com/pubs/pdf/SAR7/SAR7.pdf
Angrist JD, Pischke JS. Mostly Harmless Econometrics: An Empiricist’s Companion [Internet]. Princeton, NJ: Princeton University Press; 2009. Available from: https://www.jstor.org/stable/j.ctvcm4j72
Callaway B, Sant’Anna PHC. Difference-in-Differences with multiple time periods. J Econom [Internet]. 2021 Dec;225(2):200–30. Available from: http://dx.doi.org/10.1016/j.jeconom.2020.12.001

A Costly Assumption: What Every BDHS Researcher Should Know Before Combining Survey Waves

[PDF Version]

Written by Salina Siddiqua, Dr. S M Abdullah, Dr. Rumana Huque

Empower Your Career with ARK Foundation

ADDRESS

LOCATION

A Costly Assumption: What Every BDHS Researcher Should Know Before Combining Survey Waves

Written by Salina Siddiqua, Dr. S M Abdullah, Dr. Rumana Huque

What you can read next

Empower Your Career with ARK Foundation

ADDRESS

LOCATION