A CLI tool to easily transcribe, generate thumbnails and generate blog assets (title, description, blog, linkedin) for a given .m4a audio file.
Python.CLI.Python.project.github.mp4
This is the current workflow for creating a blog post (for my newsletter, Ambitious x Driven):
As you can see, the major bottleneck for c ontent production is preparing the blog assets from the recording.
As this is a side-project (that I'm doing next to my full-time CS Masters @ ETH Zurich), this would be unsustainable on the long-run.
Thus, I decided to build a Python CLI tool leveraging the latest LLMs & prompt-engineering techniques that would automate this workflow.
Here is a diagram that shows the processing pipeline for the blog writer.
To generate the blog assets, you need to input an audio file, a PDF resume, and a photo of the guest. It then:
- Extracts the guest details from the resume
- Transcribes the audio file (using AssemblyAI w/ speaker diarization)
- Generates landscape + square thumbnails in the Ambitious x Driven style by using an 🤗 Hugging Face Image Segmentation model & the resume details
- Generates a blog post, title, description and LinkedIn post using the guest details(to reduce hallucinations + fix transcription errors) and transcript.
[Design] Below is the system design for the blog writer.
- I created a separate File Helper class + FileHandlers that directly interact with the file system (as it's more convenient for a personal project + viewing the outputs).
- I encapsulate each Blog 'component' into a separate schema and helper class ('Files', 'Metadata', 'Thumbnails', 'Blog').
Here is how a typical file change happens (e.g. generating a title)
- User requests a file from the CLI (we are requesting files as there are multiple Zoom recordings in the folder)
- The BlogEditor class requests the file from the FileHelper
- The FileHelper parses together all the files using the relevant FileHandlers, then returns it to the BlogEditor (then CLI for viewing)
- Then, through the CLI, the user can apply relevant functions (e.g. generate all, transcribe, edit, ...). The BlogEditor gets the latest file, applies the requested function, saves the file, then returns it to the CLI for viewing.
On MacOS:
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
- Update the Zoom folder path in
blog_editor.py
- You'll have to write your own
prompts.py
file (I can provide a skeleton if it's needed - reach out here: http://linkedin.com/in/anirudhhramesh/ or anirudhh.ramesh[AT]gmail.com) - You'll have to include a .env file following the
env.example
file
- Provide .m4a/.mp3 2-person interview (with one guest and one interviewer)
- Provide resume.pdf (of the guest)
- Provide photo.png (of the guest) - needs to end with photo.png
Then:
python cli.py
This will run & open the CLI interface. I recommend going into full-screen terminal mode before running this.
In the CLI interface:
- list (see all the blogs found in your Zoom folder)
- get <blog_name> (get the blog with the given name, this will be your 'working blog')
- generate all (generate all the attributes for the blog)
- edit (edit the attribute with the given value)
- publish (publish the blog to notion)
- reset all (reset all the attributes for the blog) - (NOT YET IMPLEMENTED)
- reset (reset the attribute with the given value) - (NOT YET IMPLEMENTED)
- I probably will write a paper on this, so I'll share more details once that's prepared :)
- In the meantime, I'm looking for labs/researchers to advise the paper-writing, if you're interested email me please: anirudhh.ramesh[AT]gmail.com
I use the BiRefNet model for image segmentation model from Hugging Face. This one seemed to work best out of the couple I tried.