A comprehensive collection of idioms used in the TV series Better Call Saul. This collection was generated using Google's Gemini 1.5 Flash LLM model. You can start browsing the collection here.
I've provided the scripts used to generate the files, allowing you to tweak the prompts or make adjustments as needed.
-
scrape.py
: This script scrapes the transcripts from Better Call Saul and saves them as individual text files for each episode. -
query_model.py
: This file sends the scraped text files to the Gemini model for processing. It takes the filename as an argument and retrieves idioms from the provided transcript. -
gen-idioms.sh
: This shell script iterates through the text files generated byscrape.py
, passing each one toquery-model.py
. It saves the output idioms in Markdown format in theIdioms
directory.
- Clone the repository:
git clone [email protected]:ym496/idioms-bcs.git
cd idioms-bcs
- Set up a virtual environment:
python3 -m venv .venv
source .venv/bin/activate
- Install required Python packages:
pip install -r requirements.txt
- Give executable permissions and run:
chmod +x gen-idioms.sh
./gen-idioms.sh
Please wait for the script to finish running. You can open another terminal tab to browse the files being created in your Idioms
directory.
The files for some episodes are empty because gemini keeps giving an error when I pass those.
You can check what files are empty by running:
find ./Idioms -name "*.md" -type f -empty
The latest output of this command was:
./Idioms/S6/S6ep03.md
./Idioms/S6/S6ep13.md
./Idioms/S6/S6ep08.md
./Idioms/S1/S1ep06.md
./Idioms/S1/S1ep08.md