A large number of observational epidemiologic studies have reported consistent associations between short sleep duration and increased body weight, particularly in children and adolescents.1–12 Based on the robust associations in epidemiologic studies in combination with research findings on potential biologic and behavioral mechanisms in experimental studies,11,13–22 promoting adequate sleep hours has been proposed for obesity prevention and weight reduction.7,23–25 Causal evidence on the effect of sleep duration on body weight is still limited, however, posing a barrier to the potential translation of such research findings into effective public health interventions.5,6,26–28

Observational studies are prone to confounding and reverse causality in making causal inferences about the effect of sleep duration on body weight.1,5,6,26–28 Both sleep duration and body weight may be correlated with a host of unmeasured factors.4,29,30 For example, students who are strongly future-oriented may voluntarily choose to reduce sleep hours to increase their study time for better academic performance and future life chances. These same individuals are also more likely to watch their food intake and engage in physical activity and are therefore less likely to be obese.31,32 Students who are less future-oriented may also choose to shorten sleep duration, albeit not for study, but for Internet games; these same students are also more likely to be obese.32 Future orientation and other unobservable factors may be unmeasured confounders when estimating the effect of sleep duration on body weight in adolescents. Inadequate sleep can also be a symptom of weight problems, leading to reverse causation.12,28,33,34 Previous studies have attempted to address these issues with limited success.6,28,34 An ideal study design would be a population-based, randomized prospective experiment,5–7,12,21,35 in which a large number of participants are randomly assigned into two or more groups according to different manipulated sleep duration and subsequently followed up in free-living conditions for a sufficiently long period of time. It is hard to conduct such studies, for both ethical and practical reasons.26,35,36

In the absence of well-conducted population-based intervention studies, one alternative approach is to use a natural experiment.37–40 This study exploits a unique natural experiment that can be argued to have increased sleep duration in an adolescent population in South Korea. In March 2011, amid growing concerns over the negative consequences of late-night classes at private tutoring institutes or cram schools (hagwon), authorities in three of the 16 administrative regions decreed restricting the closing hours of hagwon to 10 pm. Assuming this policy change is a valid instrument for sleep duration, it allows for investigating the causal effect of sleep duration on body weight in a difference-in-differences and instrumental variable (IV) framework. In doing so, this article aims to make the case that the policy change meets the instrumental conditions, with subject-matter knowledge and empirical evidence, though not all the conditions are verifiable.41

METHODS

Setting and the Natural Experiment

Secondary education in South Korea involves a structured progression from middle school (7th–9th grades) to either a general high school or a vocational high school (10th–12th grades). Just before completing the 12th grade (November), most general high-school students take the nationwide standardized College Scholastic Ability Test (CSAT). Challenged by fierce competition for better college and future life opportunities, approximately 75% of secondary-school students take extra after-school classes at hagwon, which operate until late at night, often even past mid-night. The negative impact of these practices on students prompted policymakers to propose a curfew on hagwon operating hours, to ensure the protection of adolescents’ rights to health and happiness, including having enough time for leisure and sleep. After debates in the political sphere, an agreement was reached about the curfew, which would allow the education authority in each administrative region to autonomously set hagwon operating hours. In 2010, the central government proposed a 10 pm curfew in an effort to protect adolescents’ right to sleep in particular.42 On 1 March 2011, three regions implemented the 10 pm curfew for high-school students (two other regions adopted an 11 pm curfew). Seoul had already adopted the 10 pm rule for high-school students.

This policy change has not been part of efforts to address adolescents’ weight problems. Although promoting adolescent health and sleep was included as one of the policy goals, weight problems were not specifically considered in the policy debates. Rather, the policy was primarily a product of complex regional politics involving the interests of the private tutoring industry advocating the freedom for business on the one hand, and the interests of civic groups advocating adolescents’ right to sleep and health on the other hand. Although a stronger worry could have existed about students’ sleep deprivation in these regions with the curfew change, there were no known other relevant regional policies concomitantly implemented, that might have affected adolescent energy balance (such as policies aimed at promoting physical activity and reducing fast-food intake).

The effect of this policy change on adolescents’ sleep duration is likely to have varied by school type and grade. Many general high-school students in the 10th and 11th grades use hagwon to prepare for the CSAT and regular school exams, and tend to reduce their sleep hours to have more time for taking extra courses at hagwon. However, these students still have at least 1 year left before the CSAT, which is taken toward the end of 12th grade; hence, general high school 10th−11th graders may not be as desperate as 12th graders to sacrifice sleep hours for more study time. Therefore, many 10th−11th graders are likely to have slept more in response to the policy change.41,43,44 This group is used as the main sample in the current study.

In contrast, three other groups of secondary-school students are likely to have been largely unaffected by the policy change. General high school 12th graders preparing for the upcoming CSAT may still want to spend a good amount of time on self-study even after returning home from hagwon. Vocational high-school and middle-school students are also unlikely to respond to the policy change, the reasons for which, however, are different from the explanation for the sample of general high-school 12th graders. Vocational students typically do not take late-night classes at hagwon. Middle-school students have at least 3 more years before taking the CSAT and therefore are far less likely to take late-night classes, even without the 10 pm rule. These three samples will be called hereafter “placebo” samples. For the reasons aforementioned, the policy change must be irrelevant to changes in sleep duration and body weight of these groups. By showing noneffect where no effect should be found, these placebo groups help remove possible alternative explanations for the causal inference being made.45

Statistical Analysis

This study implements the natural experiment design in a difference-in-differences and instrumental variable (IV) framework. As an initial step, we performed difference-in-differences analysis to examine whether the policy change affected average sleep duration and body weight, respectively, after controlling for common trends and baseline differences between the treatment regions and control regions. The validity of the difference-in-differences estimation critically hinges on the assumption that, without the policy change, the treatment region would have experienced the same trend in sleep duration as the control region did.46 Although this common trends assumption cannot be tested directly in the main sample, the three placebo samples allow for conducting a useful test for the assumption.45,46

Each of the difference-in-differences analyses can provide information on whether the policy change had effects on sleep duration and on body weight but does not answer the research question of the study: whether sleep duration affects body weight. The ratio of the two difference-in-differences estimates (i.e., for sleep duration and for body weight in each sample), the Wald/IV estimator,46 can be thought of an intuitive approximation of the effect to be estimated in the study. This Wald/IV estimator captures the shift in average body weight per 1-hour increase in sleep duration induced by the policy change. The following IV estimation is, in essence, to extend this intuition behind the Wald/IV estimator to the IV regression framework with a full set of individual-level covariates.

Next, we conducted IV two-stage least squares analysis. The goal of this IV analysis is to estimate the causal effect of sleep duration (exposure) on body weight (outcome), exploiting the variation in sleep duration due to the policy change (IV). To be able to make causal inference, the IV estimation requires three important conditions:41 (1) the IV is associated with the exposure variable, (2) the IV does not affect the outcome other than through the exposure variable, and (3) the IV does not share any causes with the outcome. Each of the three conditions is described more fully in the following paragraphs.

First, the policy change is associated with change in sleep duration. Had the policy change been associated with no or negligible increase in sleep duration, the IV estimation would fail. This condition can be tested using the partial F statistic in the first-stage regression (results can be found in eTable 1; http://links.lww.com/EDE/B577).

Second, the policy change should affect body weight only through sleep duration, which is the exclusion restriction condition for IV estimation.41,46 If the policy change had affected, for example, eating and exercise behaviors directly (i.e., except through changes in sleep duration), our IV strategy would be problematic. Note, however, that it is not a concern here that changes in eating and exercise behaviors are due to changes in sleep duration, as the interest lies in the total effect of sleep duration, not the direct effect (i.e., after controlling for all downstream covariates). Stress could be another factor that directly mediates the effect of the policy change on body weight: Banning late night after-school classes itself could have lowered students’ stress levels in the treatment region and, consequently, their average body weight could have decreased (with no relation to sleep duration). To probe the importance of stress in mediating the effect of the policy change on body weight, we conducted an additional analysis (results are available in eTable 2; http://links.lww.com/EDE/B577). Many more possibilities are conceivable. Within the bounded time of 24 hours, any change in sleep duration must involve changes in time use on other activities, which might have consequences for body weight. For these reasons, the exclusion restriction condition is hard to justify conceptually. Unfortunately, this condition cannot be tested empirically.

Third, the policy change should not share any causes with body weight.41,47 In the current study, this condition can be considered at two levels. At the regional level, there should be no relevant concomitant changes between the treatment and control regions, that affect body weight (e.g., coincidental nutritional or physical activity campaigns). The institutional background of the policy change aforementioned provides a justification based on subject-matter knowledge. Although this condition cannot be tested empirically for the main sample, the difference-in-differences analysis of the three placebo samples offers an opportunity to examine the validity of the condition: Had there been such coincidental changes, body weight in these placebo samples would show differential trends between the treatment and control regions because such “environmental” factors would have affected indiscriminately all the samples. Ultimately, the condition requires that at the individual level, the policy change be uncorrelated with unmeasured covariates in the IV regression model. This requirement cannot be tested directly, although previous studies have compared differences in measured covariates by the proposed IV versus by the treatment, as a way to explore potential confounding by unmeasured covariates and often to justify the IV strategy.47 This article follows the method proposed in Jackson and Swanson47 and presents the results (results are available in eFigure 3; http://links.lww.com/EDE/B577).

An additional requirement for IV estimation is the monotonicity assumption,48,49 which means that, in this study, the policy change on hagwon curfew never decreases sleep duration. Several scenarios for violation of this assumption are presented (eContent 3-3; http://links.lww.com/EDE/B577), with empirical evidence in favor of the assumption (eFigure 4; http://links.lww.com/EDE/B577). This consideration also helps clarify what causal effect is being estimated among which subgroup of the study population. The causal effect estimated in this study is a weighted average per-unit treatment effect among those whose sleep duration increased because of the policy change.49

For regression analysis, we estimated three IV two-stage least squares models using BMI and two other weight outcomes. Next, we estimated four models of BMI, using each of four alternative sleep variables as the main explanatory variable. Finally, we performed a number of sensitivity analyses to examine how robust the main IV two-stage least squares results are to alternative sample definitions and the IV used. All analyses account for clustering of adolescents within clusters in the data.

Data

This study used data from the Korea Youth Risk Behavior Web-based Survey (KYRBS), an annually repeated cross-sectional, school-based, online survey of a large, nationally representative sample of middle and high-school students (age 13–18 years) in South Korea. The main objective of this survey, administered in September and October by the Korea Centers for Disease Control and Prevention, is to monitor health and behavioral risk factors of South Korean adolescents. The KYRBS employs two-stage cluster sampling: The first stage consists of 400 middle schools and 400 high schools randomly selected within 131 districts, and the second sampling stage selects one class from each grade within each chosen school. All students in the selected classes, except for school dropouts, students with special needs, and those who have difficulty in reading comprehension, are surveyed. The KYRBS respondents complete an online self-administered questionnaire in a computer room in their school. The participation rates were above 95% between 2009 and 2012.

The current study used data from four annual surveys of KYRBS (2009−2012), which spanned two surveys for each of the pre- and postperiods (i.e., before and after the curfew change in March 2011) and used nearly identical survey questions for the study variables. All administrative regions were included initially. We omitted observations from two regions that had not had a hagwon curfew before but introduced an 11 pm curfew, to ensure the homogeneity of the natural experiment and sufficiently large variations in sleep duration induced by the policy change. After we excluded observations with missing values for the study variables included in the full IV regression model, the final main sample consisted of 52,585 general high-school 10th−11th graders. Three “placebo” samples included 25,892 general high-school 12th graders, 15,206 vocational high-school 10th−11th graders, and 98,116 middle-school students (a flow chart is available in eFigure 1; http://links.lww.com/EDE/B577).

Ethics approval to conduct this study was obtained from Seoul National University College of Medicine and Seoul National University Hospital Institutional Review Board (E-1501-048-639).

Variables

The main outcome variable, body mass index (BMI), was calculated with the use of self-reported body weight (in kilograms) and height (in centimeters), with both values reported up to one decimal place. In addition, a binary indicator variable of overweight/obesity was generated based on gender- and age-specific BMI cutoffs (≥85th percentile) in the 2007 Korean Growth Standards of Children and Adolescents.50

We defined sleep duration, the main explanatory variable of interest, as average weeknight sleep hours in the past week and calculated as the duration from bedtime to wake-up time, which the KYRBS asked about in terms of separate units of hours and minutes. In addition, we examined four alternative variables of sleep duration. Based on the distribution of sleep hours, we created two binary indicator variables using the 5- and 6-hour cut-off points. We also examined bedtime as an alternative explanatory variable in this study, because late bedtime is found to be associated with weight problems.51,52 We recoded the corresponding variable so that it ranges from 20 (8 pm) to 28 (4 am). Finally, we examined perceived sleep adequacy, because this may be of interest for its direct behavioral implications. Despite its ordinal nature, this variable was treated as a continuous variable (with values of 1 = “very inadequate” to 5 = “very adequate”).

We generated three indicator variables involving treatment and period status: Treatment for the three regions where the policy change in hagwon curfew occurred on 1 March 2011 (=1 vs. 0 for other regions); Post for the observations surveyed in 2011 and 2012 (=1 vs. 0 for 2009 and 2010); and the interaction term Treatment × Post (=1 if Treatment = 1 and Post = 1 vs. 0 otherwise). Other covariates included region and year dummies, gender, education levels of both parents, self-reported household economic status, and school grade. Table 1 presents how study variables were coded in this study.

TABLE 1.: Summary Statistics of Variables in the Sample of General High-school 10th−11th Graders

RESULTS

Summary statistics of the study variables in the main sample of general high-school 10th−11th graders are presented in Table 1. The average BMI is 20.9. The proportion of overweight/obesity in the sample is 11%. The average sleep duration is 5.7 hours, and only 41% sleep 6 hours or more. The average bedtime is 00:54 am. Observations from the treatment group account for 31% of the sample, while observations are nearly equally divided between pre- and postperiods. Observations are also balanced in terms of gender and school grade (Table 1).

Table 2 shows groups means of sleep hours and BMI by period and treatment status for each of the four samples. The (unadjusted) difference-in-differences estimate for the policy effect on sleep duration in the main sample of general high-school 10th−11th graders is 0.28 (A, column 3), indicating that the 10 pm curfew resulted in a relative increase in sleep duration by 0.28 hours, or approximately 17 minutes, on average (this result suggests that the policy change is strongly associated with sleep duration, satisfying the necessary condition of IV strength, which is supported by the large partial F statistic (33) in the first-stage regression of sleep hours,53 eTable 1; http://links.lww.com/EDE/B577). The difference-in-differences estimate in column 6 suggests that the policy change decreased the average BMI by 0.11 (A, column 6). The ratio of these two difference-in-differences estimates (−0.39 in A, column 7) gives the Wald/IV estimator, an (unadjusted) IV estimate of the effect of a 1-hour increase in sleep duration on BMI. No similar pattern is observed for the three placebo samples (B−D). The overall estimates suggest that the policy change had effects on sleep duration and BMI in the main sample but virtually no effects in the three placebo samples (a more intuitive graphical representation is given in eFigure 2; http://links.lww.com/EDE/B577) (Table 2).

TABLE 2.: DD Estimates on Sleep Hours and BMI and Wald/IV Estimators

The IV two-stage least squares estimates for the main sample suggest that a 1-hour increase in sleep duration decreases, on average, BMI by 0.56 kg/m2, body weight by 1.63 kg, and the probability of being overweight or obese by 4.2 percentage points (Table 3), in a subset of general high-school 10th−11th graders, whose sleep duration would have increased because of the policy change. These estimates are larger in absolute magnitude than their respective ordinary least squares estimates from ordinary linear regressions (Table 3).

TABLE 3.: Effect of Sleep Duration (in Hours) on BMI and Other Outcomes

In the four models using the alternative sleep variables (Table 4), the IV two-stage least squares estimates are qualitatively consistent with the main IV estimate. For example, the adolescents who sleep 6 hours or more because of the policy change would have a BMI reduction of 1.19, compared with those who sleep less than 6 hours. Because two of the sleep variables are dichotomous, it is possible to estimate the size of the subgroup (“compliers”) to which the IV estimates of this study pertain.48 The proportion of ‘compliers’ was 12.6% for the 6-hour cut-off and 8.7% for the 5-hour cut-off (Supplemental Digital Content 3-3; http://links.lww.com/EDE/B577)). (Table 4)

TABLE 4.: Effect of Alternative Sleep Variable on Body Mass Index

Results from sensitivity analyses are close in magnitude to the main IV estimate of −0.56, albeit with varying magnitudes and a stronger effect in males (Table 5). Specification tests generally suggest that IV estimation is preferable to ordinary least squares in this study (eTables 3−5; http://links.lww.com/EDE/B577) (Table 5).

TABLE 5.: Sensitivity Analysis: Effects of Sleep Duration (in Hours) on Body Mass Index

DISCUSSION

Using a natural experiment that affected adolescents with short sleep duration in South Korea, this study has provided new, causal evidence that sleep gain is associated with improvement of weight problems. The overall results are in line with previous epidemiologic studies. The stronger estimated effect found in males is also consistent with the gender difference reported in the literature.2,10

This contribution is meaningful in the context of the existing epidemiologic literature on the sleep–obesity link. There has been skepticism that further epidemiologic studies employing traditional research methods would be “superfluous” with no signs of improving causal inference in the current research settings.6 In the meantime, recent years have seen in-laboratory experimental studies on the sleep–obesity link.13,17–19,22,54 Although these studies collected detailed information on sleep patterns, they involve a small number of participants and are very expensive. The current article studied a large population group in a real-world setting, while enhancing the internal validity of causal inference using a plausible natural experiment and data spanning both pre- and postperiod. Furthermore, the natural experiment allowed for studying a population-wide sleep gain (away from the conventional investigations on individual sleep loss), thus providing more pertinent evidence for a potential effective public health intervention.

This study also sheds light on another key issue debated in the literature—whether the beneficial effect of increasing sleep duration on weight is large enough to justify promoting modification of sleep duration.55 The relatively small magnitude of the association between sleep duration and body weight reported in previous epidemiologic studies has been challenged as having a negligible benefit to an individual obese patient.27,28 The magnitude of the main IV estimate in the current study is a 0.56 BMI change per hour of sleep, which would hardly seem large enough for an individual adolescent to consider increasing sleep duration for a sustained period of time. Young55 highlights the importance of putting such estimates in the perspective of population health. In fact, the BMI reduction of 0.56 per hour of sleep translates to a 4.2% point decrease in overweight/obesity risks from 11.4% in this study. As shown in this particular case, sleep extension may confer a substantial reduction in weight problems at a population level but only a small reduction in BMI for each individual, a phenomenon known as the prevention paradox.55,56 The population strategy of shifting the overall distribution of sleep duration to longer hours has greater relevance,57 because short sleep duration is widespread and has pervasive negative health consequences.58–60

The results of this study should be interpreted with caution because of several limitations. First, as described in the “Methods” section in detail, the exclusion restriction condition for IV estimation is hard to satisfy conceptually. The policy change could have affected body weight via various other mechanisms except through increased sleep duration. To the extent that such other mechanisms are at work, the IV estimates of the effect of sleep extension on BMI reduction could have been biased. Given the results of IV versus ordinary least squares analysis, overestimation is a more relevant concern. This concern, however, is partially alleviated by contextual knowledge and some evidence. The policy change is unlikely to have substantially promoted physical activity among general high-school 10th−11th graders just because more time became available between 10 pm and mid-night. Eating more certainly could have occurred because of the policy change, but this possibility would have contributed to diminishing the estimated effect on BMI reduction. Analysis on stress, though improvised, showed that it may not be important as a possible mechanism for the effect of the policy change on body weight in the current data. These arguments strengthen the case that the main effect of the policy change on BMI reduction is through sleep extension, rather than through other behavioral mechanisms independent of sleep, although the exclusion restriction still cannot be verified.

A second limitation is that the variable of sleep duration and all other variables were derived from self-reported responses, which are prone to nonignorable missingness and a variety of measurement errors. For a population-based survey, the data on sleep duration were based on relatively specific questions on sleep, which include several key elements to improve the accuracy of self-reports on sleep duration.61 The issue of systematic measurement error for sleep duration is partially addressed by the IV method used in this study, where the variation in sleep duration is induced by the exogenous policy change. Third, the IV results should be interpreted in the specific population group and context where the natural experiment existed and cannot be generalized to other settings. In the current data, the proportion of “compliers,” whose sleep duration crossed the cut-off of 5 or 6 hours because of the policy change, was around 10%. It should be noted that the IV estimates of this study are derived from this subset of the adolescent population. The apparent effect on BMI should also be interpreted as the effect of replacing late studying in hagwon by sleep in the specific social context of South Korea, which may not be the same effect as replacing other activities by sleep in other settings. Fourth, the study was not able to account for the duration of exposure under the reported sleep hours and any changes in sleep behavior. However, students in the sample are likely to maintain a relatively regular schedule largely shaped by school times and hagwon policies in South Korea. Finally, analysis of this study relies on the validity of various assumptions for statistical modeling (e.g., functional forms of variables).

Despite these limitations, this study provides population-level, causal evidence that corroborates the consistent findings in the extant epidemiologic literature on the sleep–obesity link. The growing literature in sleep epidemiology will need to better address the causality issue among diverse population groups.12,59,60,62

ACKNOWLEDGMENTS

This study used data from the Korea Youth Risk Behavior Web-based Survey, administered by the Ministry of Education, Science and Technology, Ministry of Health and Welfare, and Korea Centers for Disease Control and Prevention. An earlier version of this paper was published as a Stanford Asia Health Policy Working Paper. I thank Mary Ann Bautista, Ada Batcagan-Abueg, Eunhae Shin, and Sungchan Kang for helpful assistance. I also thank Marcel Bilger, Soo-yong Byun, Sukyung Chung, Eric Finkelstein, Hai V. Nguyen, Bisakha Sen, and Justin White for useful comments on earlier versions of this paper. I appreciate discussions with Young-Ho Khang. This paper has benefited from comments of conference and seminar participants at the American Society of Health Economists Biennial Conference, International Association for Time Use Research Conference, World Congress of Sleep Medicine, Duke-NUS Medical School Singapore, Hallym University Graduate School of Public Health, Korean Institute of Health and Social Affairs, Max Planck Institute for Demographic Research, Max Planck Institute for Human Development, Seoul National University Graduate School of Public Health, Seoul National University Program in History and Philosophy of Science, Sogang University School of Economics, and Yonsei University. I am grateful to two anonymous reviewers for helpful suggestions. All errors are my own.