Article Text


Association between an individual housing-based socioeconomic index and inconsistent self-reporting of health conditions: a prospective cohort study in the Mayo Clinic Biobank
  1. Euijung Ryu1,
  2. Janet E Olson1,
  3. Young J Juhn2,
  4. Matthew A Hathcock1,
  5. Chung-Il Wi2,
  6. James R Cerhan1,
  7. Kathleen J Yost1,
  8. Paul Y Takahashi3
  1. 1 Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
  2. 2 Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, Minnesota, USA
  3. 3 Department of Internal Medicine, Mayo Clinic, Rochester, Minnesota, USA
  1. Correspondence to Dr Paul Y Takahashi; takahashi.paul{at}


Objective Using surveys to collect self-reported information on health and disease is commonly used in clinical practice and epidemiological research. However, the inconsistency of self-reported information collected longitudinally in repeated surveys is not well investigated. We aimed to investigate whether a socioeconomic status based on current housing characteristics, HOUsing-based SocioEconomic Status (HOUSES) index linking current address information to real estate property data, is associated with inconsistent self-reporting.

Study setting and participants We performed a prospective cohort study using the Mayo Clinic Biobank (MCB) participants who resided in Olmsted County, Minnesota, USA, at the time of enrolment between 2009 and 2013, and were invited for a 4-year follow-up survey (n=11 717).

Primary and secondary outcome measures Using repeated survey data collected at the baseline and 4 years later, the primary outcome was the inconsistency in survey results when reporting prevalent diseases, defined by reporting to have ‘ever’ been diagnosed with a given disease in the baseline survey but reported ‘never’ in the follow-up survey. Secondary outcome was the response rate for the 4-year follow-up survey.

Results Among the MCB participants invited for the 4-year follow-up survey, 8508/11 717 (73%) responded to the survey. Forty-three per cent had at least one inconsistent self-reported disease. Lower HOUSES was associated with higher inconsistency rates, and the association remained significant after pertinent characteristics such as age and perceived general health (OR=1.46; 95% CI 1.17 to 1.84 for the lowest compared with the highest HOUSES decile). HOUSES was also associated with lower response rate for the follow-up survey (56% vs 77% for the lowest vs the highest HOUSES decile).

Conclusion This study demonstrates the importance of using the HOUSES index that reflects current SES when using self-reporting through repeated surveys, as the HOUSES index at baseline survey was inversely associated with inconsistent self-report and the response rate for the follow-up survey.

  • geriatric medicine
  • public health
  • social medicine

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • The HOUsing-based SocioEconomic Status index is a well-established socioeconomic status measure reflecting socioeconomic situation around the time of surveys which serves as an excellent measure for studying the association between current socioeconomic status and inconsistent self-reporting in repeated surveys.

  • Loss of follow-up, an important aspect when repeating surveys, in the cohort (the Mayo Clinic Biobank participants residing in a local community) was negligible.

  • This study is limited by characterising the likelihood of inconsistent self-reports, without knowing the true medical status.


In many epidemiological studies, self-report is widely used to collect health information including subjects’ disease status, typically collected through survey.1 These data are used to calculate the incidence and prevalence of various diseases. Self-administered surveys are popular and have several advantages compared with other approaches such as manual medical chart reviews or computational phenotyping approaches using electronic health records (eg, using natural language processing for case identification2–4) because they are easy to implement, less labour intensive, less expensive and may produce more complete data collection.5 6 In addition, surveys may be the only approach to obtain health information if the research team lacks access to the medical records. Despite the popularity of surveys as data collection tools, the limitations are also well documented, including selection bias due to non-participation to the surveys and high misclassification (either false-positives or false-negatives) driven by many factors such as recall bias and health literacy.7–9 The accuracy of self-reports varies by disease type (ie, conditions with easy diagnosis compared with those presenting non-specific symptoms), and may also reflect the challenges that many patients face when trying to understand whether they suffer or do not suffer from an illness.10–13 While extensive investigations have been performed to identify risk factors associated with the accuracy of self-reports for a single survey, studies assessing inconsistency of self-reports from repeated surveys are less common.

Due to the significant impact on health, socioeconomic status (SES) is often investigated in studies for assessing inconsistency of self-reported diseases from repeated surveys.14–18 For example, a study of mammography history from a repeated survey done over a 2-year time span reported that women from low income households had higher inconsistent responses (ie, ever user at baseline but never user at follow-up).19 However, the inconsistency was not significantly associated with education levels, a commonly used SES measure.19 Another study using seven chronic conditions showed that low wealth (net worth) was one of the key predictors for inconsistent self-reports, but was not associated with education levels and income.20 These inconclusive findings are partially due to the type of SES measure (ie, education level, income, wealth or other SES measures) used in studies. Specifically, some SES measures (eg, education level) remain largely static over time which may not be sufficient to capture important changes over the life course, especially due to ageing or disability. Although other SES measures such as wealth and income can be collected close to the time of the baseline surveys, this information is mostly unavailable in patients’ medical records, and the self-reported data suffer from similar biases that most survey-based data do, including response bias and inaccurate reporting.21 22

To overcome these problems, we developed and validated an objective (not relying on self-reports) SES index, termed the HOUSES (HOUsing-based SocioEconomic Status) index, by linking current address information to publicly available real estate property data.23 The validity of the HOUSES as an SES measure and its utility in studying health outcomes have been well established.23–26 In addition, it is suggested that the HOUSES index may integrate multidimensional factors beyond what commonly used SES measures provide which includes ageing, social isolation and functional declines.27

In this study, we sought to characterise the degree of inconsistent self-reports among Mayo Clinic Biobank (MCB) participants having a wide range of health conditions, by using baseline and follow-up questionnaires administered 4 years apart. Our primary aim was to test whether SES measured by the HOUSES index is associated with inconsistency in self-reported illness. Since the HOUSES index is based on housing characteristics close to the time of enrolment into the MCB, it will likely reflect SES around the time of surveys that may capture individual characteristics such as health literacy or cognitive ability.28 Health literacy is often defined as the degree to which individuals have the capacity to obtain, process and understand basic health information, and the services needed to make appropriate health decisions.29 A commonly used screening tool for assessing health literacy measures the extent to which the person is confident in his/her ability to complete medical forms,30 a skill very similar to filling out a research survey about their medical history. Furthermore, poor health literacy has been associated with poor illness understanding in diverse chronic conditions.31 32 Therefore, we hypothesise that subjects with low HOUSES index will have less reliable self-reported information (ie, higher inconsistency when asked repeatedly), potentially due to difficulty completing the survey and poor illness understanding driven by low health literacy. As a secondary aim, we sought to assess the response rate of the follow-up survey in relation to the HOUSES index.

Materials and methods

Study design and subjects

This cohort study used data from the MCB participants who were Olmsted County, Minnesota, residents at the biobank enrolment and were invited for a 4-year follow-up questionnaire as of December 2016. The MCB started its enrolment in April 2009, mostly based on adult Mayo Clinic primary care patients via mailed invitation while allowing volunteers. These participants are predominantly whites (95%), more females (58%) and relatively old (median age at the enrolment was 62 years).33 The MCB participants completed self-reported health-related questionnaires at enrolment, provided blood samples and consented to link their electronic medical records.33–35 Approximately 4 years after the initial enrolment, a follow-up questionnaire was mailed to the participants to update their health information. For investigating whether socioeconomic disparity exists for reporting inconsistent health conditions between repeated questionnaires, we included participants with a HOUSES index score.

Primary outcomes

The status of the follow-up survey response was recorded among the MCB participants invited as of December 2016.33 Questions on a total of 63 health conditions from 11 disease categories were included in both the baseline and the 4-year follow-up surveys (online supplementary table 1). In both surveys, the participants were asked to indicate the age when they were first diagnosed with each health condition (≤19, 20–49, 50–64, 65–79 and ≥80 years), while marking ‘None’ if they have not been diagnosed with the condition. The participants who indicated any of the age group described above were classified as ever been diagnosed with a given disease. Those indicated ‘None’ were classified as never been diagnosed. For each disease category, a subject was defined to have inconsistent self-report if one reported to have ever been diagnosed with a disease(s) within the category in the baseline survey but reported never having been diagnosed with the same disease(s) in the follow-up survey. A subject was defined to have an overall inconsistent self-reporting if he or she had at least one inconsistent disease category. The primary outcome was the overall inconsistency (‘1’ if reporting to have ‘ever’ been diagnosed in the baseline survey but reported ‘never’ in the follow-up survey for at least one disease category; ‘0’ if consistent self-reports for all disease categories).

Supplementary file 1


The main predictor of interest is the HOUSES index, an individual-level housing-based SES measure. Briefly, the HOUSES is a composite index formulated from objective (not relying on self-reporting) housing characteristics (assessed housing value, square footage of the unit, number of bedrooms and number of bathrooms) associated with an individual’s address information provided at the MCB enrolment.23 24 27 While the HOUSES index is commonly analysed as quartiles, this study used deciles (from the lowest percentile (HOUSES≤10%) to the highest percentile (HOUSES>90%)) deciles, to see any granular effect of the HOUSES index. The higher the decile, the higher the SES.23

Other factors considered for an association with inconsistency were age at enrolment by category (18–44, 45–54, 55–64 and 65 years or older), sex, race (whites, Asians, blacks and others), education attainment (high school or less, some college, college graduate or higher) and perceived health in two categories (poor/fair or at least good (excellent/very good/good)) at each questionnaire. Changes in perceived health between the two surveys were also considered and defined as worse (‘at least good’ at baseline and ‘poor/fair’ follow-up survey), consistently poor (‘poor/fair’ at both surveys), improved (‘poor/fair’ at baseline and ‘at least good’ at follow-up survey) and consistently good (‘at least good’ at both surveys).

Statistical analysis

Basic sociodemographic characteristics of the study cohort were summarised using percentages for categorical variables (age category, sex, race, education, HOUSES deciles and perceived health) and compared for the response status of the follow-up survey using Kruskal-Wallis test. The degree of inconsistent self-reporting was summarised descriptively for both overall and disease category specific. For the association between each sociodemographic characteristic, including the HOUSES decile, and the risk of inconsistent overall self-reporting, logistic regression models were used univariately, adjusting only for age. Multivariable models were also used to assess whether HOUSES scores were independently associated with the inconsistency rates, adjusting for age, sex, race, education level and changes in perceived general health between two surveys.


Study cohort and factors associated with survey response status

We observed that the subjects with low SES measured by the HOUSES index were less likely to respond to the follow-up survey when invited. Among 11 717 MCB participants invited for a 4-year follow-up survey, 73% (n=8508) responded to the survey with the median age at the MCB enrolment of 55 years (25th–75th percentiles: 43–67 years), 63% females, 92% whites. Participants in the lowest HOSUES deciles were less likely to respond to the survey when invited (56% vs 77% responded among the lowest vs the highest HOUSES deciles; table 1 and figure 1A: p<0.001). Subjects who did not report education attainment (n=264; 2.3% of the total invited) were less likely to respond to the follow-up survey (35% response rate; figure 1B). The response rate tended to be higher among those with older age (36% vs 20% for 65 years or older, among responders vs non-responders), white race (94% vs 85%), better perceived general health at baseline (96% vs 92%) and higher education level (54% vs 45% for college graduate or higher). Overall, the survey response rate was similar between males and females (64% vs 61%). However, the rate was higher in females in younger age groups (eg, 61% vs 46% in those aged between 18 and 44 years), while similar in the oldest age group (83% each; figure 2).

Table 1

Baseline sociodemographic characteristics of the study cohort, stratified by the response status of the follow-up survey

Figure 1

The proportion of the study subjects who answered the 4-year follow-up survey by HOUSES deciles (A) and education level (B). HOUSES, HOUsing-based SocioEconomic Status.  

Figure 2

The proportion of the study subjects who answered the 4-year follow-up survey for each age group (18–44, 45–54, 55–64 and 65+ years), stratified by sex. 

Degree of inconsistency in self-reporting of health conditions

Reporting health conditions inconsistently in a repeated survey is fairly common in our study cohort. A total of 63 illnesses were checked for inconsistent self-reporting between the baseline and the 4-year follow-up survey. About 90% of the participants reported having ever been diagnosed with at least one disease at the baseline (table 2). Among the participants who reported to have ever been diagnosed with a disease, 43% reported at least one condition inconsistently (ie, reported ‘ever’ been diagnosed at the baseline, but reported ‘never’ at the follow-up survey; table 2). Twenty-eight per cent reported one condition inconsistently, while 5% of the participants had three or more inconsistent self-reported conditions (table 2). Among 10 disease categories excluding haematological conditions that were rare in the study cohort, rheumatological conditions had the highest likelihood of inconsistency (29% of the subjects reported inconsistently between two surveys), and the lowest likelihood of inconsistency was for eye diseases (10%; table 2). Among individual conditions with at least 5% participants self-reported at the baseline survey, rheumatoid arthritis had the highest inconsistency rate (40%; online supplementary table 1).

Table 2

Degree of inconsistent self-reports in a repeated survey based on two time periods (baseline and the 4-year follow-up), sorted by disease prevalence at the baseline survey

Demographic characteristics for inconsistent self-reporting

We observed that some demographic characteristics such as older age and poorer perceived general health were associated with a higher proportion of subjects with inconsistent self-reporting. Univariately, a higher inconsistency rate was associated with older age (OR=2.7 (95% CI 2.4 to 3.1) for 65 years or older vs the youngest group (18–44 years) and lower education (OR=1.2, 95% CI 1.0 to 1.4)). In terms of perceived general health, participants reported poor/fair health in each survey (OR=1.8 (95% CI 1.5 to 2.1) in baseline and OR=2.0 (95% CI 1.6 to 2.5) in the follow-up survey, compared with at least good health). In addition, compared with those with consistently good general health, the participants reporting poor/fair general health in at least one survey had a higher inconsistent self-reporting (eg, OR=2.3 (95% CI 1.6 to 3.3) for comparing those with worse in the follow-up survey vs consistently good general health; table 3).

Table 3

Association between sociodemographic characteristics and inconsistent self-reports

HOUSES index for inconsistent self-reporting

Low SES measured by the HOUSES index was significantly associated with a higher likelihood of inconsistent self-reports for prevalent health conditions. Univariately, lower HOUSES deciles (lower SES) was associated with a higher likelihood of inconsistency (OR=1.6 (95% CI 1.3 to 2.0) for the highest vs the lowest decile; table 3) in a dose–response manner. Adjusting for pertinent variables such as age, perceived general health and education level, the lowest HOUSES index was independently associated with a higher likelihood of inconsistency (OR=1.5 (95% CI 1.2 to 1.8) for the lowest vs the highest HOUSES decile; table 3). In addition to the HOUSES index, age and perceived general health had independent residual effect for inconsistent self-reporting, while education level showed no association.


In this cohort study, we found socioeconomic disparities as measured by the HOUSES index exist in reporting health conditions consistently when asked repeatedly, as well as the response rate for repeated surveys. Low HOUSES index (ie, low SES at the time of surveys) based on current housing characteristics is associated with higher likelihood of inconsistent reporting of prevalent diseases. The association remained significant, even after accounting for education attainment level, a commonly used SES measure, which implies that HOUSES may measure additional socioeconomic characteristics that education level cannot capture, such as functional declines, social isolation and health literacy.36 In addition, as the HOUSES index uses housing information close to the time of surveys, it will likely reflect current socioeconomic situation that may affect the quality of surveys. We also found that subjects with low HOUSES index are less likely to respond to the follow-up survey which may further distort the study findings due to participation bias associated with SES.

In addition to relatively well-established risk factors such as older age and poor perceived general health,20 low HOUSES index was associated with a higher likelihood of having inconsistent self-reported information, independent of educational level. This supports our hypothesis that subjects with low SES at the time of surveys will have less reliable self-reported information, potentially due to low health literacy. Although health literacy and educational attainment are strongly associated, controlling for education does not fully account for the effect of health literacy on outcomes.37 Therefore, it is not surprising to see a residual effect of the HOUSES index on inconsistency, independent of education level that remains unchanged once acquired. In addition, an accumulated asset-based SES measure (eg, the HOUSES index) is suggested to be a more appropriate measure for assessing socioeconomic position for older people such as our study subjects, compared with education level.38 This study finding has both research and clinical ramifications. From the research perspective, self-reported illness from certain groups (like older participants, poor general health or those with low current SES) may be more likely to report inconsistent and/or inaccurate health information which could bias results or conclusions from studies. Clinically, self-reporting is commonly used to collect medical history, especially for patients seen in consultation, as health information from other medical facilities are usually unavailable. Therefore, this same group of people may be more likely to provide less meaningful health information when self-reported. This can also affect potential population-based health management to address health disparities derived from self-report.

In terms of predictors associated with the response rate when invited to the follow-up survey, we found that subjects with low HOUSES index were less likely to respond to the survey. Specifically, roughly 50% of the subjects having the lowest HOUSES index did not respond to the follow-up survey, compared with one-third of those with the highest HOUSES index. In addition, those with younger age, lower self-rated health and lower education levels were less likely to respond to the survey. Interestingly, we observed that subjects who did not report education attainment at the baseline survey were far less likely to respond to the follow-up survey. This observation implies that study findings based only on subjects responding to the follow-up survey might be biased by SES (ie, the study findings may not sufficiently reflect subjects with low SES).

There are several limitations to this study. First, selection bias may exist as this study is based on subjects who originally participated in the MCB. Compared with patients receiving primary care at Mayo Clinic, the MCB participants are older and sicker which may affect the degree of inconsistency rates.39 Second, HOUSES index is available only for those residing in Olmsted County, Minnesota, and thus the generalisability of the study finding may be limited to geographic areas having similar subject characteristics. Studies have shown that Olmsted County adult residents have similar characteristics to whites residing in Upper Midwest states.40 Third, this study characterised the likelihood of inconsistent self-reports between two surveys, without known true medical status (ie, no gold standard). That is, improved health literacy may lead to more accurate reporting of one’s health history. While true disease status is required to fully understand the nature of inconsistency, it is impractical to obtain true disease status for all 63 conditions in a large-scale study like this. In addition, some conditions may not be ascertained correctly even after reviewing medical records, especially when patients were cured of a particular condition and/or symptom-free for a long period of time (ie, no relevant medical records with the condition from recent years given that the MCB participants are relatively old). Furthermore, disease histories are often collected through self-reporting during the medical visit (ie, even medical records may not provide the true disease status). However, future studies may use more efficient approaches to obtain true disease stratus. Additionally, the likelihood of inconsistent self-reporting may be influenced by severity of individual diseases (ie, severe conditions such as breast cancer may have lower inconsistency rate, compared with more transient conditions such as migraine). Finally, clinical illnesses that do not use tissues and/or laboratory tests for disease diagnosis may have higher likelihood of inconsistency.


The current study presented that socioeconomic disparities exist in inconsistent self-reporting in longitudinal studies and response rates. Therefore, studies using self-reports may consider additional effort to account for these biases, and the study results should be interpreted with caution. As the degree of health disparities is frequently assessed by self-reported survey, the influence of current social positions poses important implications on clinical practice and research.


The authors would like to thank the patients who participated in the Mayo Clinic Biobank. They also thank Ms Kelly Okeson for her careful proofreading of the manuscript.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
View Abstract


  • Contributors ER and PYT were responsible for the study design, initial manuscript drafting and interpretation of the results. JEO and JRC are the primary investigators for developing and maintaining the Mayo Clinic Biobank. The formulation of the HOUSES index was done by YJJ and C-IW. ER and MAH conducted the statistical analysis in this paper. JEO, JRC and KJY contributed critically for manuscript drafting. All the authors had approved the final version of the manuscript.

  • Funding This study was supported by the Mayo Clinic Center for Individualized Medicine.

  • Competing interests None declared.

  • Patient consent Obtained.

  • Ethics approval This study was approved by the Mayo Clinic Institutional Review Board (IRB), and this secondary analysis was reviewed and approved by the Mayo Clinic Biobank Access Committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.