Files & Documentation Snapshot

Quick inventory of data files, supporting documents, and live table availability for this dataset.

Number of Files Loading linked resources from this CKAN record.
Supporting Documents Checking for codebooks, documentation, and guide files.
Live Table Checking for a DataStore-backed preview table.

Combined Resource Bundle

Data Preview

Covid Tweets

AT A GLANCE

COVID-19 vaccine-related tweets collected via the Twitter API from 2020 through January 2022. Originally published on Kaggle by Gabriel Preda, the dataset covers public discourse on seven major vaccines worldwide.

Key Highlights: - Volume: Large-scale tweet collection, updated twice daily - Vaccines Covered: Pfizer/BioNTech, Moderna, AstraZeneca, Sinopharm, Sinovac, Covaxin, Sputnik V - Period: 2020 – January 2022 - Collection Tool: Tweepy (Twitter API) - Use Cases: NLP, sentiment analysis, misinformation research


PROJECT DESCRIPTION

The collection initially focused on Pfizer/BioNTech vaccine tweets, then expanded to seven vaccines. Each tweet record includes text, metadata, and timestamp information. The dataset supports natural language processing, sentiment analysis, and studies of public health communication dynamics on social media.

Research Applications: - Sentiment analysis of vaccine attitudes over time - Misinformation dynamics on social media - Temporal shifts in public discourse during the vaccination campaign - Cross-vaccine comparison of public reception

Subject Terms: COVID-19, vaccines, Twitter, sentiment analysis, social media, NLP, public health communication


SCOPE & METHODOLOGY

  • Geographic Coverage: Global (Twitter users worldwide)
  • Smallest Geographic Unit: Individual tweet / user account
  • Time Period: 2020 – January 2022
  • Universe: Public tweets referencing COVID-19 vaccines
  • Unit of Observation: Individual tweet
  • Data Type: Observational data — social media

Data Collection: Collected using the Tweepy Python package (Twitter API). Updated twice daily. Published on Kaggle by Gabriel Preda.


CITATION

University of Maryland, College of Behavioral and Social Sciences [distributor]. Covid Tweets. Originally collected by Gabriel Preda via Kaggle. BSOS Social Science Data Repository, 2026. https://bsos-data.umd.edu/dataset/covid-tweets-2020


FILES & DOCUMENTATION

Available: - codebook_covid_twitter (XLSX) - data_covid_tweets (External link — Google Sheets)

Planned Additions: - Data Dictionary — Tweet-level variable definitions, timestamps, user metadata, vaccine keyword flags - Collection Methodology — API parameters, search queries, deduplication procedures

Duomenys ir ištekliai