Elsevier

Preventive Medicine

Volume 63, June 2014, Pages 112-115
Preventive Medicine

Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes

https://doi.org/10.1016/j.ypmed.2014.01.024Get rights and content

Highlights

  • We collected 553,186,061 tweets.

  • We filtered tweets by whether they suggested HIV-related risk behaviors (N = 9880).

  • We presented a visual map of the location of these HIV-related tweets.

  • We found a significant positive county-level relationship between HIV tweets and HIV prevalence.

  • We established the feasibility of this method to study HIV-related outcomes.

Abstract

Objective

Recent availability of “big data” might be used to study whether and how sexual risk behaviors are communicated on real-time social networking sites and how data might inform HIV prevention and detection. This study seeks to establish methods of using real-time social networking data for HIV prevention by assessing 1) whether geolocated conversations about HIV risk behaviors can be extracted from social networking data, 2) the prevalence and content of these conversations, and 3) the feasibility of using HIV risk-related real-time social media conversations as a method to detect HIV outcomes.

Methods

In 2012, tweets (N = 553,186,061) were collected online and filtered to include those with HIV risk-related keywords (e.g., sexual behaviors and drug use). Data were merged with AIDSVU data on HIV cases. Negative binomial regressions assessed the relationship between HIV risk tweeting and prevalence by county, controlling for socioeconomic status measures.

Results

Over 9800 geolocated tweets were extracted and used to create a map displaying the geographical location of HIV-related tweets. There was a significant positive relationship (p < .01) between HIV-related tweets and HIV cases.

Conclusion

Results suggest the feasibility of using social networking data as a method for evaluating and detecting Human immunodeficiency virus (HIV) risk behaviors and outcomes.

Introduction

Social networking technologies have recently been used for HIV prevention research (Gold et al., 2011, Young, 2012) as tools for recruitment (Sullivan et al., 2011), interventions (Bull et al., 2012, Young et al., 2013a), and mixed-methods research (Young and Jaganath, 2013). Because people sometimes use these technologies to publicly discuss sexual-related attitudes, desires, and behaviors, researchers may be able to use social networking data to understand and detect real-time individual and regional sexual risk behaviors and social norms (Young and Jordan, 2013). An emerging field, known as digital epidemiology, studies how these “big data” can be used to better understand, detect, and address public health problems (Salathe et al., 2012, Aramaki et al., 2011). However, no known research has been conducted on methods for how or whether these data can be used for HIV prevention or detection, making it important to evaluate the feasibility of this approach. Evaluating methods on how to use social media and “big data” in public health and medicine is an important first step in establishing how these data can be used in prevention, detection, and treatment.

For example, millions of social communications from real-time, geographically-linked, social networking sites, such as Twitter, might be used to make inferences about geographical rates of future or recent past engagement in sexual risk behaviors. Twitter, a large and rapidly growing social networking technology, allows participants to send short, public, real-time “tweet” communications (Smith and Brenner, 2012). Twitter provides public access to these data through an Advanced Programming Interface (API) (Twitter, 2013). People who intend to or have just engaged in sexual or drug-related behaviors might tweet to their social networks to inform them of their attitudes and behaviors (Walker, 2013, Young et al., 2013b). Researchers may be able to link these Twitter data to real-time incidence data to better understand and detect public health outbreaks. For instance, influenza researchers have compared flu data with tweets related to influenza symptoms and found tweets have been able to detect influenza outbreaks in regions where the tweets occurred, in advance of traditional surveillance methodologies (Aramaki et al., 2011).

HIV researchers could build on this approach by studying whether engagement in sexual risk behaviors could be inferred from tweet content, for example by filtering for keywords that suggest sexual risk and drug use behaviors (i.e., HIV risk behaviors). Because Twitter provides geographical locations (i.e., geolocated data) for some conversations, HIV risk-related tweets can ultimately be mapped alongside incidence rates to determine whether regional rates of HIV-risk conversations on Twitter could be associated with HIV transmission in those regions. However, these topics have not been studied, making it important to evaluate the feasibility of studying whether and how HIV-risk behaviors are communicated using real-time social media and whether these communications could be linked to allow analysis of data on HIV transmission.

This study is designed to evaluate the feasibility of developing methods of using “big data” to understand whether and how HIV and drug risk behaviors are communicated online in real-time and how these data might be used to inform HIV prevention and detection efforts. Specifically, this study seeks to determine 1) whether geolocated conversations about HIV risk (sexual and drug use) behaviors can be extracted from real-time social networking data, 2) the prevalence and content of these conversations, and 3) the feasibility of using HIV risk-related real-time social media conversations as a method of remote monitoring and detecting HIV transmission.

Section snippets

Methods

This study received exemption from the Virginia Tech Institutional Review Board. Tweets (N = 553,186,061) were collected from Twitter's free Advanced Programming Interface (API) between May 26, 2012 and Dec 09, 2012. We used Twitter's ‘garden hose’ method of collecting tweets, which provides a random sample of approximately 1% of all tweets. Tweets collected through the garden hose are available in real time; the data are consistently streamed as the tweets are sent through the service. A variety

Analysis

Counts of HIV-related tweets were tallied from each county and merged with HIV data from aidsvu.org (http://aidsvu.org/about-aidsvu/overview) to create a table with county-level data for analyses. Descriptive statistics for tweet metadata were calculated for sex risk and drug risk-related tweet categories, as well as for the overall demographics of Twitter users sending tweets.

Univariate regressions assessed associations between the proportion of sex, stimulant drug use, and HIV-related

Results

The majority of geolocated tweets, including general as well as HIV-risk related tweets, were sent from California (9.4%), Texas (9.0%), New York (5.7%), and Florida (5.4%). District of Columbia, Delaware, Maryland and Mississippi tweeted the most overall per capita (Table 1).

The algorithm collected 8538 sexual risk-related tweets and 1342 stimulant drug use-related tweets, totaling 9880 HIV-related tweets. District of Columbia, Delaware, Louisiana, and South Carolina sent the largest raw

Discussion

This study provides the first set of evidence for how real-time social media data might be used for extracting, detecting, and remote monitoring of health-related attitudes and behaviors. Results suggest the feasibility of using data from real-time social networking technologies to identify HIV risk-related communications, geographically map the location of those conversations, and link them to national HIV outcome data for additional analyses. Further, tweets that implied HIV-risk behaviors

Conclusion

Results from this study suggest that it is feasible to use real-time social networking technologies to identify HIV risk-related communications, geographically map the location of those conversations, and link them to national HIV outcome data for additional analyses, and that these data were associated with county-level HIV prevalence. This study was designed to provide a call for future research to understand the potential cost-effectiveness of this approach and to refine methods of using

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Acknowledgments

We wish to thank the National Institute of Mental Health (NIMH) for funding this work.

References (18)

  • S.S. Bull et al.

    A social media delivered sexual health intervention: a cluster randomized controlled trial

    Am. J. Prev. Med.

    (2012)
  • E. Aramaki et al.

    Twitter catches the flu: detecting influenza epidemics using Twitter

  • C. Chew et al.

    Pandemics in the age of Twitter: content analysis of tweets during the 2009 H1N1 outbreak

    PLoS ONE

    (2010)
  • A. Culotta

    Towards detecting influenza epidemics by analyzing Twitter messages

  • J. Gold et al.

    A systematic examination of the use of online social networking sites for sexual health promotion

    BMC

    (2011)
  • V. Lampos et al.

    Flu Detector — Tracking Epidemics on Twitter

    (2010)
  • M. Salathe et al.

    Digital epidemiology

    PLoS Comput. Biol.

    (2012)
  • S. Shoptaw

    Methamphetamine use in urban gay and bisexual populations

    Top. HIV Med.

    (2006)
  • A. Smith

    Who's on what: social media trends among communities of color

There are more references available in the full text version of this article.
View full text