ANTi-Vax: A Novel Twitter Dataset for COVID-19 Vaccine Misinformation Detection

Document Type


Source of Publication

Public Health

Publication Date



Objectives COVID-19 (SARS-CoV-2) pandemic has infected hundreds of millions and inflicted millions of deaths around the globe. Fortunately, the introduction of COVID vaccines provided a glimmer of hope and a pathway to recovery. However, due to misinformation being spread on social media and other platforms, there has been a rise in vaccine hesitancy which can lead to a negative impact on vaccine uptake in the population. The goal of this research is to introduce a novel machine learning-based COVID-19 vaccine misinformation detection framework. Study Design We collected and annotated COVID-19 vaccine tweets and trained machine learning algorithms to classify vaccine misinformation. Methods More than 15,000 tweets were annotated as misinformation or general vaccine tweets using reliable sources and validated by medical experts. The classification models explored were XGBoost, LSTM, and BERT transformer model. Results The best classification performance was obtained using BERT, resulting in 0.98 F1-score on the test set. The precision and recall scores were 0.97 and 0.98 respectively. Conclusion Machine learning-based models are effective in detecting misinformation regarding COVID-19 vaccines on social media platforms.




Communication | Computer Sciences | Medicine and Health Sciences


COVID-19, Vaccines, Text classification, Misinformation detection, Deep learning, Natural language processing

Scopus ID


Indexed in Scopus


Open Access


Open Access Type

Bronze: This publication is openly available on the publisher’s website but without an open license