Skip to main content
An official website of the United States government
Grant Details

Grant Number: 7R01CA225773-05 Interpret this number
Primary Investigator: Primack, Brian
Organization: Oregon State University
Project Title: Leveraging Twitter to Monitor Nicotine and Tobacco Cancer Communication
Fiscal Year: 2022


Patterns in Twitter data have revolutionized understanding of public health events such as influenza outbreaks. While researchers have begun to examine messaging related to substance use on Twitter, this project will strengthen the use of Twitter as an infoveillance tool to more rigorously examine nicotine, tobacco, and cancer- related communication. Twitter is particularly suited to this work because its users are commonly adolescents, young adults, and racial and ethnic minorities, all of whom are at increased risk for nicotine and tobacco product (NTP) use and related health consequences. Additionally, due to the openness of the platform, searches are replicable and transparent, enabling large-scale systematic research. Therefore, our multidisciplinary team of experts in diverse relevant fields—including public health, behavioral science, computational linguistics, computer science, biomedical informatics, and information privacy and security—will build upon our previous research to develop and validate structured algorithms providing automated surveillance of Twitter’s multifaceted and continuously evolving information related to NTPs. First, we will qualitatively assess a stratified random sample of relevant NTP-related tweets for specific coded variables, such as the message’s primary sentiment and other key information of potential value (e.g., whether a message involves buying/selling, policy/law, and cancer-related communication). Tweets will be obtained directly from Twitter using software we developed that leverages a comprehensive list of Twitter-optimized search strings related to NTPs. Second, we will statistically determine what message characteristics (e.g., the presence of certain words, punctuation, and/or structures) are most strongly associated with each of the coded variables for each search string. Using this information, we will create specialized Machine Learning (ML) algorithms based on state-of-the-art methods from Natural Language Processing (NLP) to automatically assess and categorize future Twitter data. Third, we will use this information to provide automatic assessment of current and future streaming data. Time series analyses using seasonal Auto-Regressive Integrated Moving Averages (ARIMA) will determine if there are significant changes over time in volume of messaging related to each specific coded variables of interest. Trends will be examined at the daily, weekly, and monthly level, because each of these levels is potentially valuable for intervention. To maximize the translational value of this project, we will partner with public health department stakeholders who are experts in streamlining dissemination of actionable trends data. In summary, this project will substantially advance our understanding of representations of NTPs on social media—as well as our ability to conduct automated surveillance and analysis of this content. This project will result in important and concrete deliverables, including open-source algorithms for future researchers and processes to quickly disseminate actionable data for tailoring community- level interventions.