Because everyone deserves to know which famous person they could've been if their parents had better connections 😉
Twin Celebrity App is a machine learning project that uses face embeddings and vector similarity search to find celebrity lookalikes. The project combines the following technologies:
- FaceNet embeddings for the celebrities faces
- Qdrant for vector similarity search
- ZenML for pipeline orchestration
- Streamlit for the user interface
- Google Cloud Run for deployment (optional)
In case you want to see the project in action, you can check out the following YouTube video:
The project is built around the following components:
Check out the ZenML pipeline here!
The embedding generation pipeline is responsible for extracting face embeddings from the celebrities images. It uses the FaceNet architecture to generate the embeddings and stores them in a Qdrant vector database. The pipeline is orchestrated using ZenML.
The ZenML pipeline is configured to use either a local Qdrant instance or a Qdrant Cloud instance, depending on the use_qdrant_cloud
flag. It's composed of the following steps:
-
Load the dataset: The dataset is loaded from a Hugging Face dataset.
-
Sample the dataset: The dataset is sampled to reduce the number of embeddings to process.
-
Generate the embeddings: The embeddings are generated using the FaceNet architecture.
-
Store the embeddings: The embeddings are stored in a Qdrant vector database.
-
Store images in local filesystem: The images are stored in your local filesystem for later use in the Streamlit application (after dockerizing the app).
Check out the Streamlit UI here!
The Streamlit UI is the user interface of the application. It allows you to upload an image, search for the closest celebrity lookalike and display the results. The cool thing about this UI is that you'll learn how to use your webcam as the input!
Check out the vector_twin package here!
The vector_twin package contains all the logic for the embedding generation pipeline and the vector search system. You'll also see some scripts to help you manage ZenML secrets generation and deployment.
- Python 3.11 or higher
- Poetry for dependency management
- Docker
- ZenML
- Qdrant Cloud account for cloud deployment
- Google Cloud account for deployment
- Clone the repository:
git clone https://github.com/yourusername/vector-twin.git
cd vector-twin
- Install dependencies using Poetry:
poetry install
- Activate the virtual environment:
poetry shell
Create a .env
file in the project root with the following variables:
QDRANT_URL=your-qdrant-cloud-url
QDRANT_API_KEY=your-qdrant-cloud-api-key
You can use the .env.example
file as a template. Simply copy it to .env
and fill in the values.
First of all, you'll need to login to ZenML cloud and create a new tenant. After that, you can initialize he ZenML CLI and set up the default stack - that's basically what is happening in the configure-zenml
command.
make configure-zenml
To run the project locally, you'll need to start the local Qdrant instance and run the embedding pipeline. You can do this with the following command:
make start-local-app
This will start the local Qdrant instance and run the embedding pipeline. Then, it will also creat the Streamlit UI. You can then access the application by navigating to the http://localhost:8501
URL in your web browser.
To run the project with cloud deployment, you'll need to create a Qdrant Cloud cluster (free!) and add the credentials to your .env
file. Take the .env.example
file as a template for your .env
file.
Once you have your QDRANT_URL and QDRANT_API_KEY set, you can run the following command:
make insert-embeddings-qdrant-cloud
This command will run the ZenML pipeline and insert the embeddings into your Qdrant Cloud cluster. Once the embeddings are inserted, you cann check the vectors are populating your Qdrant Cloud cluster by navigating to Qdrant's UI.
To deploy the Streamlit app, we'll use Google Cloud Run. This will allow us to deploy the app in a fully managed way, but it will also require a Google Cloud Account and the Google Cloud CLI installed.
- Authenticate with Google Cloud:
gcloud auth login
- Set your Google Cloud project:
gcloud config set project <PROJECT_ID>
- Add the necessary permissions:
gcloud services enable cloudbuild.googleapis.com
gcloud services enable run.googleapis.com
gcloud services enable artifactregistry.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com
gcloud services enable secretmanager.googleapis.com
- Auth docker registry:
gcloud config set compute/region <LOCATION>
gcloud auth configure-docker <LOCATION>-docker.pkg.dev -q
Location is the region where you want your project to be deployed. In my case, I'm using 'europe-west1'.
- Create the Docker repository:
gcloud artifacts repositories create vector-twin-app --repository-format=docker \
--location=<LOCATION> --description="Docker repository for the Twin Celebrity App" \
--project=<PROJECT_ID>
- Create secrets for Cloud Run:
echo -n <QDRANT_URL> | gcloud secrets create QDRANT_URL \
--replication-policy="automatic" \
--data-file=-
echo -n <QDRANT_API_KEY> | gcloud secrets create QDRANT_API_KEY \
--replication-policy="automatic" \
--data-file=-
- Add Cloud Run permissions to secrets:
gcloud projects add-iam-policy-binding vector-twin \
--member="serviceAccount:$(gcloud projects describe $(gcloud config get-value project) --format="value(projectNumber)")[email protected]" \
--role="roles/secretmanager.secretAccessor"
- Finally, let's build the Cloud Run application:
gcloud run deploy vector-twin \
--port=8501 \
--image=<LOCATION>-docker.pkg.dev/<PROJECT_ID>/vector-twin-app/app \
--allow-unauthenticated \
--region=<LOCATION> \
--platform=managed \
--project=<PROJECT_ID> \
--memory=2Gi \
--update-secrets=QDRANT_API_KEY=QDRANT_API_KEY:latest,QDRANT_URL=QDRANT_URL:latest
- (Bonus) cloudbuild.yaml file:
I've also created a cloudbuild.yaml file to automate the deployment process. You can find it in the root of the project. To run it, you can use the following command:
gcloud builds submit --region=<LOCATION>
And that's it! You can now access the application by navigating to your Google Cloud Project and clicking on the Cloud Run service.
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request