Image copyright: Diana Polekhina © unsplash.com
This is the capstone project of the neue fische Data Science Bootcamp. It was a collaborative effort between Christina Rudolf, Raymond Boateng and Juliane Berek.
The H1N1 influenza pandemic (also known as swine influenza) was caused by H1N1 influenza virus and affected the world between June 2009 to August 2010 (according to WHO declaration). It was the most recent pandemic prior to COVID19. An estimated 11 - 21% of the global population was affected, with deaths in the U.S. totalling 12,469.
This project aimed to:
- Help to increase vaccination rates for seasonal and pandemic flu in the overall population (this reducing the burden of influenza by decreasing hospitalisations/ deaths)
- Identify factors that determine the chance of getting vaccinated
- Identify groups with lower likelihood for getting vaccinated, in order to target them with promotions
- Determine differences of seasonal vs. H1N1 (pandemic) vaccinations
The original data was collected through the National 2009 H1N1 Flu Survey in the U.S. between 2009 - 2010. The current dataset formed part of the DrivenData Challenge "Flu Shot Learning: Predict H1N1 and Seasonal Flu Vaccines".
The dataset consists of approximately 27 000 participant responses. The main outcome variables in the dataset:
- whether the participant received the vaccination against the H1N1 flu
- whether the participant received the vaccination against the seasonal flu
The remainder of the dataset consists of 35 categorical variables, broadly falling into participant demographics, attitudes and knowledge about H1N1 and seasonal flu and vaccination, and healthcare information.
We had two main hypotheses we intended to address in our project:
- Some features affect the likelihood of vaccination more than others, e.g. attitudes and knowledge, recommendations by doctors
- H1N1 vaccination is taken more due to the pandemic context
The main factors related to pandemic response have been visualised in our dashboard.
- Pandas
- Numpy
- Plotly/ Dash
- Heroku
- SciKit-Learn
- MLFlow
- Random Forest
- ELI5
- Permutation importance
- SHAP