AT A GLANCE
COVID-19 vaccine-related tweets collected via the Twitter API from 2020 through January 2022. Originally published on Kaggle by Gabriel Preda, the dataset covers public discourse on seven major vaccines worldwide.
Key Highlights:
- Volume: Large-scale tweet collection, updated twice daily
- Vaccines Covered: Pfizer/BioNTech, Moderna, AstraZeneca, Sinopharm, Sinovac, Covaxin, Sputnik V
- Period: 2020 – January 2022
- Collection Tool: Tweepy (Twitter API)
- Use Cases: NLP, sentiment analysis, misinformation research
PROJECT DESCRIPTION
The collection initially focused on Pfizer/BioNTech vaccine tweets, then expanded to seven vaccines. Each tweet record includes text, metadata, and timestamp information. The dataset supports natural language processing, sentiment analysis, and studies of public health communication dynamics on social media.
Research Applications:
- Sentiment analysis of vaccine attitudes over time
- Misinformation dynamics on social media
- Temporal shifts in public discourse during the vaccination campaign
- Cross-vaccine comparison of public reception
Subject Terms: COVID-19, vaccines, Twitter, sentiment analysis, social media, NLP, public health communication
SCOPE & METHODOLOGY
- Geographic Coverage: Global (Twitter users worldwide)
- Smallest Geographic Unit: Individual tweet / user account
- Time Period: 2020 – January 2022
- Universe: Public tweets referencing COVID-19 vaccines
- Unit of Observation: Individual tweet
- Data Type: Observational data — social media
Data Collection: Collected using the Tweepy Python package (Twitter API). Updated twice daily. Published on Kaggle by Gabriel Preda.
CITATION
University of Maryland, College of Behavioral and Social Sciences [distributor]. Covid Tweets. Originally collected by Gabriel Preda via Kaggle. BSOS Social Science Data Repository, 2026. https://bsos-data.umd.edu/dataset/covid-tweets-2020
FILES & DOCUMENTATION
Available:
- codebook_covid_twitter (XLSX)
- data_covid_tweets (External link — Google Sheets)
Planned Additions:
- Data Dictionary — Tweet-level variable definitions, timestamps, user metadata, vaccine keyword flags
- Collection Methodology — API parameters, search queries, deduplication procedures