Statistics from Altmetric.com
Self-reported body mass index (BMI) can substantially understate true BMI.
Conventional BMI thresholds for obesity may need to be adjusted downwards in order to obtain a more accurate profile of obesity from self-reported BMI data.
This article demonstrates different options for this adjustment.
Use of the Youden's J for downward adjustment involves an implicit weighting of the relative costs of false-positive and false-negative misclassification.
It is preferable for analysts to explicitly incorporate weighting of different costs. Depending on the approach, the downward adjustment required may be up to 4 units.
Strengths and limitations of this study
This article provides a clear analysis of criteria which can be used to adjust BMI thresholds when using self-reported data.
The approach adopted here can be applied to other datasets.
The article also shows how these adjustments may differ by age and gender.
The results may be specific to the time and place, Ireland in 2006.
Thus, while the approach is general, the specific adjustments adopted here may not be.
When analysing large scale, nationally representative datasets, obesity is typically measured via body mass index (BMI), weight (in kilos) divided by height (in metres) squared. The WHO suggests a threshold BMI of 25 for ‘overweight’, a threshold of 30 for ‘obesity’ and a threshold of 40 for ‘severely obese’.
BMI has been criticised as a measure of obesity with some authors suggesting that other measures such as total body fat, percentage body fat and waist circumference are superior measures of fatness.1 However, while these measures may provide a more accurate indicator of obesity, they are expensive to produce, and in terms of large-scale nationally representative datasets, the likelihood is that BMI will remain the most commonly used indicator of obesity for the foreseeable future.
However, there is a further measurement issue with BMI as it is frequently reported in large-scale nationally representative datasets. Once again, for reasons of economy, it is typically the case that BMI is calculated from self-reported height and weight. This clearly gives rise to a scope for misreporting (compared with true measured height and weight). If misreporting was random (people being as likely to over and under-report their height/weight) then reported mean BMI would still be unbiased, but reported variance would be higher than ‘true’ variance. However, if misreporting is systematic, then this represents a more serious problem, since it suggests that mean BMI as calculated from national samples may be biased, and further problems emerge if the degree of bias differs across categories such as age, gender and socioeconomic background.
Evidence worldwide suggests that misreporting in self-reported BMI is not random and that through a combination of over-statement of height and under-statement of weight, self-reported BMI will typically underestimate ‘true’ (or measured) BMI.2 The issue has also been analysed for Ireland showing that the degree of misreporting appears to be increasing over time.3
Analysis of Swiss data with self-reported and clinically measured BMI finds evidence that self-reported BMI understates obesity levels.4 Using receiver operating characteristic (ROC) curves, the threshold level of self-reported BMI is adjusted for it to provide the ‘optimal’ signal of true underlying BMI. However, the revised thresholds have been criticised on the basis that they are relevant only for this specific dataset, and for other datasets, different thresholds may be optimal.5 ,6
This paper further examines the relationship between self-reported and measured BMI and the nature of the optimal threshold adjustment. However, the paper employs a wider range of approaches to calculate the ‘optimal’ threshold and shows how calculated thresholds can vary quite substantially depending on the approach adopted. In particular, some of the more popular approaches may lead to analysts unconsciously making value judgements regarding the relative costs of different types of misclassification. The paper also examines whether the optimally calculated threshold differs according to characteristics such as age and gender.
Our data come from the Survey of Lifestyle, Attitudes and Nutrition in Ireland, usually known as the Slan survey. The Slan surveys were carried out in 1998, 2002 and 2007. For this paper, we use the 2007 data, since as well as providing information on self-reported BMI it also provides information on clinically measured BMI for a reasonable-sized subset of the sample (data on measured BMI were also provided for 1998 and 2002 Slan, but proportionally these subsamples were only half as large as that for 2007). The Slan 2007 survey is a comprehensive, nationally representative survey carried out by face-to-face interview in the respondent's house with a sample size of 10 364. The 2007 sample was provided by the Irish Social Science Data Archive (ISSDA) with the Geodirectory (a listing of all residential addresses in Ireland compiled by the postal service) used as the sampling frame and weights supplied with the data (in all subsequent analysis sampling weights are applied). In addition to the main study, there were two substudies conducted. The first was a subsample of 967 individuals aged 18–44 who had their body height and weight measured. In addition, there was subsample of 1207 respondents aged 45 and over who took part in a more complete physical examination. The subsamples were representative of the main sample for their age groups and it is this combined group of 2174 individuals which forms the group from which our eventual sample is obtained as explained below (figures 1⇓⇓⇓⇓⇓⇓⇓–9).7
In the application here, measured BMI is taken as the ‘true’ or gold standard measure of obesity and a threshold of 30 for this measure partitions the population into the binary categories of obese and non-obese. We then assess the degree to which self-reported BMI (the ‘marker’) produces the ‘same’ partition. If self-reported BMI assigns someone as obese who is also obese under the measured BMI definition, then this is a true positive (TP). If it signals someone as obese who is not obese under the measured definition it is a false positive (FP). If it signals someone as non-obese even though they are obese under the measured definition it is a false negative (FN). Finally true negatives (TN) are those who are classified as non-obese under both definitions.
The TP rate is sometimes called the sensitivity (Se) of the signal and is TP/(TP+FN), while the corresponding concept for the TN rate is known as specificity (Sp) and is TN/(FP+TN), which in turn is equal to one minus the FP rate.
Where we have one marker which is continuous, but where we wish to choose the optimal threshold for that marker, so that the partitioning of the population into obese and non-obese by the marker (self-reported BMI) is in some sense ‘closest’ to the partitioning by the true measure (clinically measured BMI), the ROC curve can be made use of.5 In particular, the threshold which maximises the Youden's J index may be chosen, that is, the point which is most ‘north-west’ on the ROC curve. Intuitively, the J index is Se+Sp−1, that is, the sum of the Se and Sp rates (−1).
However, there are other possible and arguably equally plausible criteria for choosing the optimal threshold. For example, we could choose the threshold which maximises the percentage of cases which are correctly classified (or minimises those misclassified). This may be labelled efficiency, and it is that value of the threshold, t*, which maximises P×Se(t)+(1−P)×Sp(t), where P represents the prevalence of obesity (in proportional terms).8
Another approach is to choose that threshold which maximises the OR, that is, (TP×TN/FP×FN) which is effectively the ratio of correct to incorrect classifications.
Note that the efficiency and Youden's J approach are both specific cases of a more generalised approach. The rate of FNs for any given threshold, t, will be P×(1−Se(t)), while that of FPs is (1−P)×(1−Sp(t)). Note that in this case, we are referring to the rate of FN relative to the total population (hence we multiply by P) as opposed to the rate relative to those who are truly obese. However, the analyst may associate different costs with different types of misclassification. For example, it seems reasonable in the case of obesity that analysts would assign a higher weight to FN rather than FP, since if someone is diagnosed FN they may not take precautions in terms of diet and lifestyle which they probably should. A diagnosis of FP on the other hand may lead them to consult their physician where their ‘true’ BMI will presumably become known.
If the cost of a FN is given by CFN and that of a FP by CFP, then the total cost associated with any given threshold is CFN×P×(1−Se(t))+CFP×(1−P)×(1−Sp(t)). A decision rule could then be adopted to choose that threshold, t*, which minimises the above expression or equivalently which minimises r×P×(1−Se(t))+(1−P)×(1−Sp(t)), where r=(CFN/CFP) is the relative cost of FN compared with FP.
It has been pointed out that the choice of a threshold based on the maximisation of Youden's J is equivalent to a choice based on the minimisation of cost where r, the ratio of the cost of FN to that of FP, is set equal to (1−P)/P. Thus, Youden's J is a specific case of a more general decision-based approach.9 Another way of looking at this is that should an analyst choose that threshold which maximises the value of Youden's J, they are implicitly (and perhaps unknowingly) imposing a relative cost of FN to FP equal to (1−P)/P, a ratio which may or may not conform to the actual values or beliefs of the analyst.
It is also clear that the value of t which maximises efficiency is also that which minimises r×P(1−Se(t))+(1−P)×(1−Sp(t)) where r=1. Thus, both efficiency and Youden's J can be regarded as special cases of a more general decision-based approach.
The approaches we have described above essentially involve choosing that threshold which minimises a weighted average of the cost of FP and FN, where the weights can either be chosen explicitly by the analyst or may be implicitly chosen by the choice of an index such as the Youden's J index. However, it is also possible that the analyst may take what we can call a constrained optimisation approach. Suppose, as would seem natural in the application here, the analyst regards FN as more costly than FP. The analyst could then choose a benchmark level of FN above which he is not prepared to go. The threshold is then that level which minimises the FP rate subject to attaining the given level of FN. It can be regarded as a constrained optimisation approach in that FP is minimised subject to attaining a given level of FN.
We now examine how t* varies according to the following criteria: efficiency, Youden's J, maximum value of the OR and the minimum cost basis where we choose a range of r (some values of r, of course, having already been included in efficiency and the J index) and a constrained optimisation approach where we choose three values of FN (1%, 5% and 10%). The latter is equivalent to choosing Se levels of 99%, 95% and 90%. We also examine how t* varies according to age and gender.
Self-reported BMI was collected as all respondents were asked to self-report their weight without clothes and their height without shoes. In addition, about 20% of the sample (2174) also underwent a medical examination, which included height and weight measurement. Respondents provided the self-reported data before their examination, and weight and height were measured in light clothing without shoes. Weight was measured to the nearest 0.1 kg using electronic platform scales and height was measured to the nearest 0.1 cm using measuring rods.
Since the purpose of this paper was to examine the relationship between self-reported and measured BMI, we are forced to restrict our sample to those who provided data on both. In the version of Slan provided to us by ISSDA, there were initially 2171 observations where measured BMI was available. We then discarded those observations where self-reported BMI was not available bringing the sample size to 1976. When examining summary BMI statistics for this group, it became clear that there were a small number of cases which appeared to suffer from measurement error (eg, recorded self-reported BMI of zero), and so it was decided to trim the data by removing all observations with BMI (either self-reported or measured) less than 15 or greater than 50. This brought the sample size to 1874.
Table 1 gives summary statistics for our sample and for the complete Slan 2007 sample (the latter figures were obtained from the Slan 2007 report). The discrepancy between self-reported and measured BMI is clear. There is a gap of over 9% between measured obesity and self-reported obesity, that is, measured obesity is higher than self-reported by almost two-thirds and the t statistic for the paired t test is 12.9 (p=0.000). In terms of the actual BMI (as opposed to BMI categories), self-reported BMI is about 1.4 below the measured BMI and a paired t test of the null hypothesis of equality of measured and self-reported BMI gives a t statistic of 26.3 (p=0.000). Table 1 also shows that the data used in our analysis have a slightly younger age profile and correspondingly a slightly higher education profile than the complete sample.
Table 2 provides more detail on the discrepancy between self-reported and measured height and weight by age and gender and also includes the t statistic for the paired t test of equality between self-reported and measured. In all cases, the difference was statistically significant at conventional levels. The pattern of discrepancy is similar to that found in studies for the USA.10 While the US studies did not find that all discrepancies were statistically significant, it is noticeable that those categories where significance was found for the USA (eg, reporting of weight for women) were those categories where the t statistic was highest in this study.
Table 3 shows the cross-tabulation between self-reported and measured BMI on the basis of a threshold of 30 for both measures. This table shows that if we use a threshold of 30 for both measures, then self-reported BMI will correctly classify about 87% of observations, that is, (1386+250)/1874. This corresponds to a Se rate of about 55% and a Sp rate of about 98%.
Table 4 shows the value of t* for different criteria and for the whole of our sample as well as specific subgroups and it also provides rates of Se and Sp. By reading down the column, we can see how t* varies according to the different criteria. Taking the column for the total sample initially, we first of all see that the values of t* essentially fall into three bands. First of all, if we employ the efficiency criterion, we obtain a t* of 29.1, quite close to the typically adopted threshold of 30. Thus, 30 is only likely to be close to the optimal value of the threshold if the ‘efficiency’ criterion is used, that is, equal costs are assigned to FN as to FP.
The values of t* for the other criteria can be assigned into two bands, both of which differ quite substantially from 30. Using the criteria of Youden's J, maximising the OR or minimising the MCF for ‘low’ values of r (ie, 2–5), we obtain a range of t* from 27.1 to 27.5. It is worth noting that t* as chosen by the Youden's J index is the same as t* for r=3. This is to be expected since with P=0.24, (1−P)/P=3.17. Clearly, the higher the value of r, the higher the relative cost of FN, and the lower the t*, since in the limit a very low value of t* would ensure no FN, though at the expense of a very high rate of FP. This is essentially what is happening with respect to the third band of values of t*, those chosen using r=10 and the constrained optimisation criterion whereby we choose ‘standard’ Se values of 99%, 95% and 90% (corresponding to FN rates of 1%, 5% and 10%, respectively). This gives a range of t* of 22.4–27.1, considerably lower than the other ranges. However, this high rate of Se comes at the expense of low rates of Sp, in the region of only 30%.
It is also clear that choosing ‘high’ values of r, that is, 10 or above, provide values of t* which are very similar to those when we choose ‘conventional’ levels of significance of, say, 5%.
The pattern of three ‘bands’ of t* persists when we look at t* by age and gender, and as before the values of t* for the efficiency criterion are the highest, while those using the constrained optimisation criteria are the lowest. In general, the recommended t* for women is lower than for men. The pattern with respect to age is not so clear-cut. For the constrained optimisation approach with a FN rate of 1%, the recommended t* for young is over 2 units lower than for old, but for other criteria there is not so much difference.
The results indicate that in the case of self-reported and measured BMI, it appears likely that for any population, or for any approach to calculating t*, with the exception of the efficiency criterion where the cost of FN and FP are equivalent, the optimal threshold will differ from 30. Quite how far from 30, however, depends on what optimisation criterion is chosen. For relatively low values of r, the relative costs of FN to FP, a threshold self-reported BMI of around 27–27.5 seem appropriate, indicating a downward adjustment of the current threshold for self-reported BMI of up to 2.5 units. Given the implicit weighting of FN and FP in Youden’s J index, then with prevalence rates in the region of 24%, the downward adjustment as chosen by this criterion will be of the same magnitude. However, if the analyst wishes to be guaranteed a Se rate of 95% (or higher), then a downward adjustment of 4 or maybe more units would seem to be required.
The results from this study are primarily of interest to public health authorities. Given that for many such authorities the principal, nationally representative, source of information on BMI comes from self-reported measures, it is important to be clear about what adjustments should be made to these measures to obtain a more accurate signal of true BMI, and whether the adjustment should be uniform. This study has shown that there is a range of choice in terms of these adjustments.
The particular adjustment which is to be made depends on a number of factors. The desired Se of the test (and also the ratio of costs of FN to FP) will depend on the nature of treatment. In the case of obesity, a choice of a low threshold will ensure a low rate of FN, but perhaps a relatively high rate of FP. However, since the treatment for obesity (in terms of changed lifestyle, etc) is relatively non-intrusive and easily reversible, once the ‘true’ diagnosis becomes known, then for self-reported BMI there does seem to be a case for a low threshold. This might not be the case if treatment was invasive and with potentially harmful side effects.
The underlying seriousness of the condition in terms of increased morbidity and mortality will also be relevant. There is some recent evidence suggesting that the relationship between BMI and mortality may not be monotonic, with higher BMI over some ranges (in particular 25–30) appearing to have a protective effect in terms of mortality and BMI for grade 1 levels of obesity (ie, BMI from 30 to 35) having no significant impact on mortality.11 In that case, the relative cost of FN would presumably become lower.
The results of the study are probably of less immediate concern to individuals or clinicians. For individuals, the study serves as an additional reminder that self-reported measures can be prone to bias and that great care should be taken when self-measuring height and weight. Clinicians will naturally be in a position to obtain accurate height and weight measures for their patients, but once again the study may serve to remind them of the degree to which the patients may erroneously provide information.
It could be suggested that given that BMI is a function of height and weight, the adjustments should be made directly to these variables, rather than to BMI itself. However, a drawback to this approach is that we are interested in the adjustment which should be made to the self-reported BMI for it to provide a more accurate signal for the underlying obesity (where BMI exceeds 30). It is the underlying obesity which is ultimately of concern to us and so the adjustment needs to be made to the variable which directly measures obesity. While adjustments could be made to height (or weight), any particular value for height (weight) is consistent with many different values for BMI. Thus, there is no threshold value for either height or weight which determines obesity; rather, it is their interaction as defined by BMI, in which case it seems preferable to make the adjustment directly to BMI.
There are a number of final points which should be borne in mind. First of all, as mentioned in the introduction, there are criticisms of BMI as an indicator of obesity and this should always be remembered when analysing the population-level BMI data. The particular BMI thresholds chosen for obesity and overweight could also be criticised as arbitrary and perhaps should differ for different subpopulations. However, even if different thresholds are chosen, the issue of the discrepancy between self-reported and measured BMI is likely to remain.
Second, the optimal thresholds calculated in this paper may be specific to the sample analysed and these thresholds may differ for different samples, for example, for different countries or time periods. However, this study gives some idea of the range of possible adjustments which could be made and some of the adjustments which are suggested here are similar to those found elsewhere.4
Finally, while this study has shown that, depending on the objectives of the analyst, there are a number of different adjustments to self-reported BMI which could be made, it should also be borne in mind that there may be a virtue to keeping public health messages simple. So, while there is a range of adjustments to choose from, it may be best to choose just one in terms of a public health message. Hopefully, this study will prove useful in terms of making that choice.
The author is grateful to Donal O Neill, Olive Sweetman and two journal referees for helpful comments and to Karen Morgan for assistance with the data. The usual disclaimer applies.
Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Data used in this study are available from the Irish Social Science Data Archive (email@example.com). Alternatively, statistical code and dataset available from the corresponding author.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.