Article Text

Download PDFPDF

Original research
Examination of the demographic representativeness of a cross-sectional mobile phone survey in collecting health data in Colombia using random digit dialling
  1. Deivis Nicolas Guzman-Tordecilla1,2,
  2. Andres I Vecino-Ortiz1,
  3. Angélica Torres-Quintero2,
  4. Camila Solorzano-Barrera2,
  5. Joseph Ali3,
  6. Rolando Enrique Peñaloza-Quintero2,
  7. Saifuddin Ahmed4,
  8. George W Pariyo1,
  9. Vidhi Maniar5,
  10. Dustin G Gibson1
  1. 1Department of International Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
  2. 2Institute of Public Health, Pontificia Universidad Javeriana, Bogota, Colombia
  3. 3Berman Institute of Bioethics, Johns Hopkins University, Baltimore, Maryland, USA
  4. 4Population, Family and Reproductive Health, Johns Hopkins University, Baltimore, Maryland, USA
  5. 5International Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
  1. Correspondence to Dr Deivis Nicolas Guzman-Tordecilla; deivy-gt{at}hotmail.com

Abstract

Objectives As mobile phone ownership becomes more widespread in low-income and middle-income countries, mobile phone surveys (MPSs) present an opportunity to collect data on health more cost-effectively. However, selectivity and coverage biases in MPS are concerns, and there is limited information about the population-level representativeness of these surveys compared with household surveys. This study aims at comparing the sociodemographic characteristics of the respondents of an MPS on non-communicable disease risk factors to a household survey in Colombia.

Design Cross-sectional study. We used a random digit dialling method to select the samples for calling mobile phone numbers. The survey was conducted using two modalities: computer-assisted telephone interviews (CATIs) and interactive voice response (IVR). The participants were assigned randomly to one of the survey modalities based on a targeted sampling quota stratified by age and sex. The Quality-of-Life Survey (ECV), a nationally representative survey conducted in the same year of the MPS, was used as a reference to compare the sample distributions by sociodemographic characteristics of the MPS data. Univariate and bivariate analyses were performed to evaluate the population representativeness between the ECV and the MPSs.

Setting The study was conducted in Colombia in 2021.

Participants Population at least 18 years old with a mobile phone.

Results We completed 1926 and 2983 interviews for CATI and IVR, respectively. We found that the MPS data have a similar (within 10% points) age–sex data distribution compared with the ECV dataset for some subpopulations, mainly for young populations, people with none/primary and secondary education levels, and people who live in urban and rural areas.

Conclusions This study shows that MPS could collect similar data to household surveys in terms of age, sex, high school education level and geographical area for some population categories. Strategies are needed to improve representativeness of the under-represented groups.

  • Health informatics
  • Latin America
  • PUBLIC HEALTH

Data availability statement

Data are available upon reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • This is the first study in Latin America that has examined the representativeness of mobile phone surveys with random digit dialling using a nationally representative household survey as a referent.

  • Due to budgetary constraints, sample sizes were not met for any of the computer-assisted telephone interview age–sex quota.

  • We have used ‘age–sex’ stratification for improving and ensuring high response rates across all age groups among men and women.

Introduction

Health surveys are important tools for monitoring disease burden and disease risk factors and resource allocation, and evaluating health policies.1 In low-income and middle-income countries (LMICs), household surveys have traditionally been the gold standard means of data collection. However, household surveys require face-to-face (FTF) interviews, multiple attempts to gather information from selected households and often travel to hard-to-reach places.1 2 Consequently, they are conducted infrequently due to high costs related to personnel and transportation.1 Moreover, conducting household surveys are more challenging in security risk areas or when epidemiological (eg, pandemics) conditions change.3 4 Alternative or supplemental data collection methodologies have emerged to address some of these challenges.4 5

High-income countries have implemented mobile phone surveys (MPS) to collect health and demographic data at the population level because of their lower cost and the population’s universal access to mobile phones and landlines.6–9 Currently, LMICs have shown a high penetration rate of mobile phones.10 For example, in Latin America, 70% of people own mobile phones.11 In Colombia, there are 133 mobile phone subscribers per 100 adults.12

Taking advantage of the high penetration of mobile phones globally and in LMICs, research shows there are increasing efforts to introduce MPS, which includes short message service, interactive voice response (IVR), and computer-assisted telephone interviews (CATI), survey modalities13 14 for more cost-effective and rapid data collection.4 15

However, there are concerns of population representativeness of MPS in countries where coverage of mobile phones is not yet universal.16 Moreover, some population groups may be more willing to participate in a survey compared with others. It is possible to know the extent of non-response rates by demographic characteristics in household surveys and accordingly use weighting method to adjust the estimates, but that may not be feasible for MPS. Few studies compared the representativeness of MPS to standard health FTF surveys in LMICs, particularly from middle-income countries.5 17–19 The sparse evidence published in LMICs about this topic is mainly from low-income countries.5 20–23 As far as we know, published studies in middle-income countries that compare the sociodemographic characteristics of MPS using IVR and CATI are scant.3 24–26

This study aims to examine the population-level representativeness of MPS by comparing selected sociodemographic characteristics of respondents of an MPS, which collected data on non-communicable disease (NCD) risk factors, with those who completed a similar FTF survey—Quality-of-Life Survey (ECV)—in Colombia. Studies have shown that age, sex, education and geographical residence variables are key determinants of selection and coverage biases in MPS.16 27 We have used ‘age–sex’ stratification for improving and ensuring high response rates across all age groups among men and women.16 Our specific interest is to compare the sample distributions of education level and geographical areas across the eight age–sex strata of the MPS. Information on the extent of representations will help the country to develop appropriate sample weighting for improving representativeness of the MPS data.

Methods

Study setting

The MPS was conducted under the Bloomberg Data for Health Initiative,28 which aims to develop and optimise MPS to collect data on NCD risk factors in Colombia and elsewhere. Due to regulations related to access to personal data of individuals in Columbia,29 there is no sampling frame available that lists working (activated) mobile phone numbers. We used the random digit dialling technique to generate potential mobile phone numbers.30 We used the first three digits corresponding to the prefixes from the mobile network operators for Colombia, ranging from 300 to 351, and the remaining seven numbers were randomly generated. We distributed the prefixes of the mobile phone numbers evenly. Potential mobile phone numbers were generated only once using STATA V.14.31 The phone numbers were not repeated across CATI and IVR modalities.

CATI data collection

A call centre based in Bogota, Colombia, was hired to conduct CATIs. Calls were performed between 08:00 and 17:00 local time, from Monday to Saturday, by six call agents or interviewers with at least 1 year of CATI experience. For each potential phone number, a maximum of three call attempts were made in case the respondents missed the call or could not answer. Randomly generated phone numbers were uploaded to a custom dialling software platform that directs the call to a call centre agent. Individuals were eligible for the survey if they were at least 18 years old and the sample size had not yet been reached for each of the eight age–sex strata (male/female, ages 18–29, 30–44, 45–59 and 60+ years old).

All call centre agents received training to conduct the CATI survey and were supervised by a field coordinator. The survey instrument and CATI software were piloted prior to full data collection. Data collection took place from 15 January to 13 April 2021. Only those who provided their informed consent were administered the survey.

IVR data collection

The IVR survey involved a series of pre-recorded questions in Spanish.32 The audios were professionally recorded by a nationally recognised female journalist and then loaded onto a self-service platform, engageSPARK.33 The survey response options were programmed to be answered using the mobile phone keypad. For example, ‘If you are male, press 1. If you are female, press 3’.

The pilot phase consisted of several trials to test the performance of the platform, understanding of the questions, and audio quality and comprehension. The pilot phase was conducted in early January 2021. Adjustments were made to some of the question scripts and optimising audios to improve the quality and comprehension recorded. IVR surveys were launched between 08:00 and 17:00 local time, from Monday to Saturday in batches of 250 000 numbers at a time from 26 January to 23 April 2021.

MPS questionnaire

The questionnaire applied in both modalities (IVR and CATI) was the same, except for minor changes specific to each survey modality for the sake of clarity. The survey included several components: (1) informed consent; (2) demographic characteristics; (3) NCD modules (tobacco use, alcohol use, dietary intake, physical activity and blood pressure); and (4) delivery of incentives. NCD modules are groups of questions on NCD behavioural risk factors covered by existing risk factor surveys, such as the WHO STEPS survey and the Global Adult Tobacco Survey.34 35 All the answer options were multiple choice or binary, except for questions on age, food consumption and physical activity.

Calls through all modalities were free to respondents. All participants who completed the survey and had a prepaid mobile phone plan were automatically sent an airtime incentive of COP 5000 to their mobile phones (approximately equivalent to US$1.37 at the time). Postpaid mobile phone plans do not accept top-up credits, but they are a minority of all lines available in Colombia.36

National household survey questionnaire

We used the data from the ECV (for its initials in Spanish) as a reference for the representativeness of population for the reference period to compare the characteristics of our data. The ECV survey was conducted during 2021 when the MPS was implemented. ECV collects information on the living conditions of Colombians, including variables related to housing, education, health, labour force, etc. ECV interviewed 257 589 subjects using FTF encounters. The survey included data on men and women aged 0–112 years old from rural and urban areas. For our study, we restricted the ECV sample to those 18 years of age or older. Details on the sampling and representativeness, validity and reliability of the ECV can be consulted elsewhere.37

Respondent characteristics

Sociodemographic information was included from those who completed at least one NCD module of the MPS. The MPS included four sociodemographic variables. Age (years old) was measured as a continuous variable and then was grouped into four categories (18–29, 30–44, 45–59 and 60+). Education level (none/primary, secondary, and university or higher), geographical area (urban vs rural) and sex (men vs women) were measured as ordinal and binary variables.

The same sociodemographic variables and measures were selected for the ECV survey. However, respondents below the age of 18 years were excluded from the ECV to enable a proper comparison of the variables’ distribution between the two surveys. This was because the MPS only collected data from respondents who were at least 18 years old.

Also, we categorised the results from calls of the MPS-based standards set in the American Association for Public Opinion Research.38 A complete interview was defined as answering at least four of the five NCD modules, and a partial interview was defined as answering at least one, but no more than four, NCD modules. Break-offs were defined as those who were age–sex eligible with less than one NCD module completed.

Sample size

The MPS used stratified random sampling design with eight age–sex strata men and women by age groups 18–29, 30–44, 45–59, and 60 and older) in each modality (IVR and CATI). We estimated a quota sample of 385 observations for each age–sex strata, with an expected prevalence of the risk factor of 50% (p=0.5), a margin of error of 5% (δ=0. 05) and 5% type I error (α=0. 05) and as recommended by WHO.39 We generated the potential mobile phone numbers using the digit dialling technique, as mentioned previously. The age–sex quota stratification was used to improve representations of some population groups in the MPS surveys, such as older and female populations.20 23

Statistical analysis

We used several approaches to compare the sociodemographic characteristics of the surveys (CATI– IVR and ECV). First, we performed a univariate analysis from the call disposition codes. Second, we conducted a bivariate analysis between age–sex strata by mobile phone modality to identify whether sampling quotas were met. Third, we estimated and compared 95% CIs of selected sociodemographic characteristics between the MPS and FTF (ECV) for checking their statistical comparability. We considered that the surveys have dissimilar distributions if the CIs do not overlap.40 Because the sampling designs of the MPS and FTF surveys were different and the FTF was based on extremely large sample size (n=257 589) compared with MPS (n=2581 and 1837 for IVR and CATI, respectively), we did not combine the datasets for formal statistical tests. Also, we estimated differences in sample distribution proportions between MPS versus ECV data. We used STATA V.14 to run all the analyses.

Patient and public involvement

No members of the public or patients were involved in this study.

Results

Call outcomes

For the CATI survey, 105 050 numbers were dialled (table 1 and figure 1). Approximately 21% (n=21 949) of all calls made were sent to an inactive line, and 69% (n=72 331) did not pick up the call. Of the remaining 10% who picked up the call, 8798 people hung up the call; 15 refused to participate in the study; 80 were under 18 years old; 41 were partial interviews; and 1837 were complete interviews. The average survey length was 23 min and 10 s (table 2).

Figure 1

Flowchart of call outcomes for CATI and IVR. It starts with the quantity of dialled mobile phone numbers and ends with the completed MPS. CATI, computer-assisted telephone interview; IVR, interactive voice response; MPS, mobile phone survey; NCD, non-communicable disease.

Table 1

Call outcomes for the random digit dial mobile phone samples

Table 2

Cost, time and call outcome rates for mobile phone surveys

For the IVR survey, 1 231 779 numbers were dialled (table 1 and figure 1). Approximately 18% (n=228 256) of all calls made were sent to an inactive line; 31% (n=381 523) did not pick up the call; and 50% of the responders hung up the call. Of the remaining observations, 102 refused to participate in the study; 630 were under 18 years old; 402 were partial interviews; and 2581 were complete interviews. The average survey length was 17 min and 23 s (table 2).

Contact, response and cooperation rates were higher for CATI, 2.33%, 2.26% and 95.15%, respectively, compared with IVR, 0.35%, 0.30%, and 74.27% respectively (table 2).

Sociodemographic characteristics of MPSs

For the CATI survey, we did not meet quota sampling in any age–sex group. The two age–sex strata closer to filling the quota were women between 18–29 years old (n=295) and 30–44 years old (n=370) (table 3). Approximately 78% (n=1465) of the sample had completed high school or higher, and 78% lived in an urban area (table 4). For the IVR survey, we met five quotas: men aged 18–29 and 30–44 years and women aged 18–29, 30–44 and 45–59 years (table 3). Approximately 82% (n=2446) of the sample had completed high school or higher, and 82% lived in an urban area (table 4).

Table 3

Quota met by mobile phone modality

Table 4

Comparison of face-to-face and MPS samples by age strata among (A) women and (B) men

In comparing female participants from ECV and CATI samples, we found that systematically, all proportions of difference were within 10% points for each age strata for education and geographical variables except for distribution of the least and most educated in those aged 60 years and above. The distribution of women with none or primary education was similar between ECV and CATI for age groups 30–44 (16.9% (95% CI 16.3% to 17.6%) vs 16.5% (95% CI 13.1% to 20.6%)) and 45–59 (36.1% (95% CI 35.1% to 37.2%) vs 35.6% (95% CI 29.8% to 41.9%)). Similar distributions for those with university or higher education were observed only in the 18–29 strata (11.2% (95% CI 10.3% to 12.2%) vs 15.6% (95% CI 11.9% to 20.2%)). For other age strata, CATI had higher levels of university education than ECV (table 4).

In comparing female participants from ECV and IVR samples, we also found that systematically, all proportions of difference were within 10% points for each age group for education and geographical variables except for distributions of the least educated in those 45–59 and 60+ and those with secondary education in the oldest age group. IVR respondents were similar to ECV respondents for primary education in age groups 18–29 (7.5% (95% CI 5.4% to 10.4%) vs 7.6% (95% CI 7.1% to 8.2%)) and 30–44 (15.1% (95% CI 12.1% to 18.7%) vs 16.9% (95% CI 16.3% to 17.6%)). The distribution of those with highest education was similar only for 30–44 year-olds. Like CATI, remaining female-age distributions for university education were higher in IVR than ECV (table 4).

Geographical location was similar between ECV and MPS distributions for all age–sex strata except for CATI and IVR respondents aged 45–59 (table 4).

For male participants, there were more instances, as compared with female participants, where the absolute differences in proportions between the ECV sample and MPS sample were greater than 10%. The distribution of men with none or primary education was similar between CATI and ECV for age groups 18–29 (6.8% (95% CI 4.1% to 11.0%) vs 10.9% (95% CI 10.3% to 11.6%)) and 30–44 (18.8% (95% CI 14.5% to 24.1%) vs 21.5% (95% CI 20.7% to 22.3%)). For those with university or higher education, there were no similar proportions for any of the age strata (table 4).

In comparing male participants from IVR and ECV, there were similar differences of proportions for those with none or primary education for age-group 18–29 years (10.1% (95% CI 7.7% to 13.2%) vs 10.9% (95% CI 10.3% to 11.6%)) and 30–44 years (21.0% (95% CI 17.5% to 24.9%) vs 21.5% (95% CI 20.7% to 22.3%)). Respondents with university or higher education were only similar in the age group 30–44 years (20.9% (95% CI 17.5% to 24.9%) vs 16.9% (95% CI 15.9% to 18.0%)) (table 4).

Location was similar for male participants in both CATI and IVR, as compared with ECV for age groups 18–29 and 30–44 years. Location was also similar for oldest CATI but not IVR respondents (table 4).

Overall, we found that the MPS (CATI and IVR) reported higher education levels than the ECV, especially for the university or higher education levels. The age groups with the higher representative in the MPS samples were people between 18–29 years old and 30–44 years old, and the most representative sex in the sample was female (table 4). We also observed that most of the MPS responders were from urban areas, similar to the ECV data (table 4). Additionally, in the 45–69 age group, the CATI surveys reported better performance related to the number of categories with similar proportion distribution to the ECV (table 4).

Discussion

To our knowledge, this is the first study in Latin America that has examined the representativeness of random digit dialling technique MPS surveys using a nationally representative household survey as a referent. One of the main findings of this study was that MPS could collect similar data as household survey in terms of education level and geographical area across several age–sex strata.

Evidence from LMICs has shown that mobile phones are an effective tool to collect health data, especially from younger, male, educated and urban populations.20 21 23 41 Although we have used age strata quota for tracking and improving responses among older population, our study found that people older as those over 45 years old were still difficult to reach in our data collection procedure. Most of the participants in this study (CATI–IVR) were people between 18 years old and 44 years old. The previous finding is consistent with the results from prior research conducted in Colombia, which explored the perception and feasibility of implementing MPS.32 That study found that respondents perceive these types of surveys as inconvenient since they can take a long time and interfere with their daily activities since the CATI survey was implemented during business hours (08:00–17:00) and took about 23 min to complete the survey. Consequently, it is possible that surveying the population during that time frame impacted the survey completion quotas in the population over 45 years old, which is an economically active population in Colombia.42 Furthermore, the same study mentioned previously also indicated that some adults have more free time to answer this kind of survey,32 which could explain why some survey completions were achieved in the adult population.

Different from other studies published in LMIC,4 20 23 43 the current study found that women were the most represented population group in our data. This may be related to the fact that women are more willing to answer this type of survey.44 This could be due to the fact that in Colombia, women are more likely to be homemakers, while men are typically employed outside of the home. This might result in men having less time to answer calls, leading to a lower response rate than women. On the other hand, our study supports the assertion that those with higher education levels are the ones who most respond to this type of survey. Our data illustrate that more than 80% of the sample that answers the MPS obtained a secondary or higher education. This is reasonable since some studies in Colombia found that one of the main barriers to implementing MPS is technological literacy,32 45 which is more likely in people with a lower level of education.

Additionally, we find that most of the surveys came from urban areas. This is consistent with the population distribution presented by household survey used as a reference in this study and data from the 2018 census. This may also be due in part to the fact that there is better mobile phone coverage in urban areas than in rural areas. Moreover, a study in Colombia found that people who live in rural areas usually do not pick up calls from unknown mobile phone numbers.45 Also, our findings showed that the digit dialling technique was an effective technique for generating mobile phone numbers to collect data with CATI and IVR modalities. This is similar to other previous findings in other LMICs, where the Random digit dialing (RDD) technique successfully collected data on health comparable to national health surveys.20

We also found that more sample quotas were achieved for IVR because 10 times more numbers were dialled in IVR than CATI. That difference was due to time and money constraints. With unlimited resources and continuing with CATI calls, we would likely have completed some or all of the CATI quotas. The higher response and cooperation rates observed in CATI are likely due to the presence of the human interviewers and their ability to build trust with the respondents. However, it should be noted that the CATI survey after 45 years reported the greatest number of similarities with ECV than IVR. On the other hand, deploying MPS surveys might be relatively inexpensive compared with other types of FTF surveys.1 For example, the survey for IVR in our study cost, on average, about US$36, and CATI cost US$6 without including the salary of the call agents for CATI.

One of the limitations of this study was the small number of demographic variables available in the mobile phone sample. Although this may limit the capacity to explore in-depth the differences between mobile phone and FTF samples, education and location are the main variables collected by household surveys in Colombia. Another limitation to report is the fact that we do not use sample weights for cell phone surveys as other studies of this nature have done in the past,22 23 but this might be more important if we were generating population-level estimates for a given indicator such as current tobacco smokers. In addition, CATI calls were made during working hours, which could affect the availability to pick-up or respond to calls. To mitigate this limitation, calls were also made on Saturdays since many people do not work that day in Colombia. Finally, conducting this study during the COVID-19 pandemic created an additional challenge.

Conclusions

This study shows that MPS could collect similar data as household survey in terms of education level and geographical area across several age–sex strata, especially in populations under 45 years old. Also, the findings have implications for health by providing insights into the feasibility of using MPSs to collect data on NCD risk factors. More work is needed to reach better representativeness of under-represented groups. Future studies should consider exploring other data collection strategies or alternating MPS modalities to improve its representativeness.

Data availability statement

Data are available upon reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and was approved by the research and ethics committee of the Public Health Institute of Pontificia Universidad Javeriana (0001 from 2020) and the Johns Hopkins Bloomberg School of Public Health Institutional Review Board (#00007318). The participants gave informed consent to participate in the study before taking part.

Acknowledgments

The authors thank the EngageSpark team for their support in implementing interactive voice response surveys and the Edumetrika team for the computer-assisted telephone interview surveys.

References

Footnotes

  • Twitter @NicolsGuzmnTor1, @0000-0003-0689-487X

  • Contributors Conception or design of the work: DNG-T, AIVO, DGG, AT-Q, CAS-B, VM, REP, JA, SA and GWP. y. DGG is the guarantor. Data collection: DNG-T, AIVO, DGG, VM, AT-Q and CAS-B. Data analysis and interpretation: DNG-T, AIVO, DGG, SA, AT-Q, CAS-B, REP, JA and GWP. Drafting the article: DNG-T, AIVO, DGG, AT-Q and CAS-B. Critical revision of the article and final approval of the version to be submitted: DNG-T, AIVO, DGG, AT-Q, CAS-B, REP, JA, SA, GWP and VM.

  • Funding This work was funded by Bloomberg Philanthropies (grant number 131761). This funding agency had no role in the preparation of this article.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.