Skip to content

Latest commit

 

History

History
51 lines (33 loc) · 1.9 KB

README.md

File metadata and controls

51 lines (33 loc) · 1.9 KB

Trends in Data Science

The main goal of this project is to monitor the trends in the UK data science job market. Users can view all trends at this Shiny App https://apps.statcore.co.uk/trends-in-data-science/. For a write-up see this Medium article Python vs R: How to Analyse 4000 Job Advertisements Using Shiny & Machine Learning

I originally started this project in 2018 to help me decide whether to learn Python or not. I now use it as motivation to keep learning Python!

-- Project Status: Completed

Methods Used

  • Web Scraping
  • Data Visualization
  • Topic Modelling (LDA)
  • Web Application Development & Hosting
  • Task Scheduling

Technologies

  • R
  • Shiny
  • Selenium
  • Docker
  • Linux
  • Azure

Project Description

The data source for this project is the jobserve website. On a schedule (daily) we perform the following

  1. Scrape all 'Data Scientist' jobs from jobserve
  2. Pre-process data, produce visualisations and build topic models on the job description
  3. Present output using an interactive web application

The three distinct tasks each have their own folder

  • Scraping
  • Analyse
  • Shiny

Each task has its own docker image, and is launched on a schedule using cron.

For the Shiny App we use Nginx as a reverse proxy and to encrypt all traffic using SSL. The Nginx folder contains the required config file.

Lastly there are a number of helper shell scripts in the root directory which automate some of the repetitive tasks (docker run, docker compose up etc).

Getting Started

Follow the setup instructions

Contact