A little script that downloads the NHK 高校講座 video lectures. Once they are downloaded, it re-encodes them in a more convenient format.
-
ActiveRecord
-
Sqlite
-
Nokogiri
-
Debugger
Install Bundler:
gem install bundler
Bundle it:
bundle
Thats it!
Just run the script:
./nhk_scraper.rb
The first task is to collect the URLs of all of the videos that we would like to download. This script uses Nokogiri to parse the index page for all the videos, get links to each show (and season) and then finally follows those links to get the URL to each episode. These URLs are then entered in to a sqlite database.
Next, the script uses VLC to download the video files. VLC needs to be used since the actual video content is served via Windows Media server, which doesn’t allow a direct download. This step uses the database to determine which files haven’t yet been downloaded, and only attempts to download the ones that are missing.
Once the videos are done downloading, they are run through Handbrake’s CLI which converts them in to a more convenient format suitable for playback on iDevices or a PS3.
Both of the above tasks happen within a very simple thread pool. (which I found at # burgestrand.se/code/ruby-thread-pool/) This allows multiple instances of VLC or Handbrake to do their magic concurrently.
This is public domain, use it only for good.