Skip to content

Python Script and Jupyter notebook using selenium to pull one of the worst designed event websites i have ever seen.

Notifications You must be signed in to change notification settings

social-anthrax/NSO-Calendar-Scraper

Repository files navigation

NSO Calendar scraper

A selenium based scraper for the University of Pennsylvania NSO calendar. The calendar normally doesn't provide any way of downloading all the applicable events, so this script was created to do just that. The generated calendars can be imported into Google Calendar, Outlook, or any other calendar software that supports the iCal format, and can be found in the releases section of this repository here: https://github.com/social-anthrax/NSO-Calendar-Scraper/releases/tag/latest

Calendars

The calendars are dynamically updated at 1 am every day via github actions so should stay up to date. If you'd like to subscribe to the calendars (I don't personally recommend it as you won't be able to delete items, but at least it will stay up to date), copy the links below into your calendar software of choice as subscriptions. Alternitevely you can download the calendars and import them into your calendar.

Calendar Entries

The scraper fixes the following issues with the standard NSO calendar entries:

  • Adds the location field (Normally parsable by ICalendar, not tested on other platforms).
  • Correctly displays the description (normally the description is unrendered HTML and borders on unreadable).
  • Adds a link to the original event (in case you want to see the original description).
  • Corrects the datetime when NSO manages to set it to the UNIX epoch, or when the end time is after the start time.
  • Adds the mandatory flag to the event name.

WARNING: IF NSO DID NOT MARK THE EVENT AUDIENCE CORRECTLY YOU WILL NOT SEE THE EVENTS, FOR EXAMPLE THE ACTIVITIES FAIR IN 2023 WAS MARKED FOR SECOND YEARS ONLY

Usage

The project uses poetry for dependency management so make sure you have it installed. Once installed use poetry install to install the dependencies. Tested with python 3.11.

If you wish to develop/modify the code make sure to install the development dependencies too. poetry install --with='dev' The script and notebook are kept in sync via jupytext, which is set up to run as a part of the pre-commit hook. Make sure to install the precommit hooks with poetry run pre-commit install. This will ensure that the notebook and script are kept in sync, as well as remove the notebook output.

About

Python Script and Jupyter notebook using selenium to pull one of the worst designed event websites i have ever seen.

Resources

Stars

Watchers

Forks

Packages

No packages published