||1R01CA225773-01 Interpret this number
||University Of Pittsburgh At Pittsburgh
||Leveraging Twitter to Monitor Nicotine and Tobacco-Related Cancer Communication
Patterns in Twitter data have revolutionized understanding of public health events such as influenza outbreaks.
While researchers have begun to examine messaging related to substance use on Twitter, this project will
strengthen the use of Twitter as an infoveillance tool to more rigorously examine nicotine, tobacco, and cancer-
related communication. Twitter is particularly suited to this work because its users are commonly adolescents,
young adults, and racial and ethnic minorities, all of whom are at increased risk for nicotine and tobacco
product (NTP) use and related health consequences. Additionally, due to the openness of the platform,
searches are replicable and transparent, enabling large-scale systematic research. Therefore, our
multidisciplinary team of experts in diverse relevant fields—including public health, behavioral science,
computational linguistics, computer science, biomedical informatics, and information privacy and security—will
build upon our previous research to develop and validate structured algorithms providing automated
surveillance of Twitter’s multifaceted and continuously evolving information related to NTPs. First, we will
qualitatively assess a stratified random sample of relevant NTP-related tweets for specific coded variables,
such as the message’s primary sentiment and other key information of potential value (e.g., whether a
message involves buying/selling, policy/law, and cancer-related communication). Tweets will be obtained
directly from Twitter using software we developed that leverages a comprehensive list of Twitter-optimized
search strings related to NTPs. Second, we will statistically determine what message characteristics (e.g., the
presence of certain words, punctuation, and/or structures) are most strongly associated with each of the coded
variables for each search string. Using this information, we will create specialized Machine Learning (ML)
algorithms based on state-of-the-art methods from Natural Language Processing (NLP) to automatically
assess and categorize future Twitter data. Third, we will use this information to provide automatic assessment
of current and future streaming data. Time series analyses using seasonal Auto-Regressive Integrated Moving
Averages (ARIMA) will determine if there are significant changes over time in volume of messaging related to
each specific coded variables of interest. Trends will be examined at the daily, weekly, and monthly level,
because each of these levels is potentially valuable for intervention. To maximize the translational value of this
project, we will partner with public health department stakeholders who are experts in streamlining
dissemination of actionable trends data. In summary, this project will substantially advance our understanding
of representations of NTPs on social media—as well as our ability to conduct automated surveillance and
analysis of this content. This project will result in important and concrete deliverables, including open-source
algorithms for future researchers and processes to quickly disseminate actionable data for tailoring community-
Identifying Key Target Audiences for Public Health Campaigns: Leveraging Machine Learning in the Case of Hookah Tobacco Smoking.
, Colditz J.
, Malik M.
, Yates T.
, Primack B.
Journal of medical Internet research, 2019-07-08; 21(7), p. e12443.
JUUL: Spreading Online and Offline.
, Colditz J.B.
, Primack B.A.
, Shensa A.
, Allem J.P.
, Miller E.
, Unger J.B.
, Cruz T.B.
The Journal of adolescent health : official publication of the Society for Adolescent Medicine, 2018 Nov; 63(5), p. 582-586.
Toward Real-Time Infoveillance of Twitter Health Messages.
, Chu K.H.
, Emery S.L.
, Larkin C.R.
, James A.E.
, Welling J.
, Primack B.A.
American journal of public health, 2018 08; 108(8), p. 1009-1014.