Twin Celebrity App

Find your Twin Celebrity in Vector Space

Because everyone deserves to know which famous person they could've been if their parents had better connections 😉

Introduction

Twin Celebrity App is a machine learning project that uses face embeddings and vector similarity search to find celebrity lookalikes. The project combines the following technologies:

FaceNet embeddings for the celebrities faces
Qdrant for vector similarity search
ZenML for pipeline orchestration
Streamlit for the user interface
Google Cloud Run for deployment (optional)

In case you want to see the project in action, you can check out the following YouTube video:

Project Design

The project is built around the following components:

Embedding Generation Pipeline

Check out the ZenML pipeline here!

The embedding generation pipeline is responsible for extracting face embeddings from the celebrities images. It uses the FaceNet architecture to generate the embeddings and stores them in a Qdrant vector database. The pipeline is orchestrated using ZenML.

The ZenML pipeline is configured to use either a local Qdrant instance or a Qdrant Cloud instance, depending on the use_qdrant_cloud flag. It's composed of the following steps:

Load the dataset: The dataset is loaded from a Hugging Face dataset.
Sample the dataset: The dataset is sampled to reduce the number of embeddings to process.
Generate the embeddings: The embeddings are generated using the FaceNet architecture.
Store the embeddings: The embeddings are stored in a Qdrant vector database.
Store images in local filesystem: The images are stored in your local filesystem for later use in the Streamlit application (after dockerizing the app).

Streamlit UI

Check out the Streamlit UI here!

The Streamlit UI is the user interface of the application. It allows you to upload an image, search for the closest celebrity lookalike and display the results. The cool thing about this UI is that you'll learn how to use your webcam as the input!

Vector Twin Package

Check out the vector_twin package here!

The vector_twin package contains all the logic for the embedding generation pipeline and the vector search system. You'll also see some scripts to help you manage ZenML secrets generation and deployment.

Prerequisites

Python 3.11 or higher
Poetry for dependency management
Docker
ZenML
Qdrant Cloud account for cloud deployment
Google Cloud account for deployment

Installation

Clone the repository:

git clone https://github.com/yourusername/vector-twin.git
cd vector-twin

Install dependencies using Poetry:

poetry install

Activate the virtual environment:

poetry shell

Configuration

Environment Variables

Create a .env file in the project root with the following variables:

QDRANT_URL=your-qdrant-cloud-url
QDRANT_API_KEY=your-qdrant-cloud-api-key

You can use the .env.example file as a template. Simply copy it to .env and fill in the values.

ZenML Setup

First of all, you'll need to login to ZenML cloud and create a new tenant. After that, you can initialize he ZenML CLI and set up the default stack - that's basically what is happening in the configure-zenml command.

make configure-zenml

Running the project locally

To run the project locally, you'll need to start the local Qdrant instance and run the embedding pipeline. You can do this with the following command:

make start-local-app

This will start the local Qdrant instance and run the embedding pipeline. Then, it will also creat the Streamlit UI. You can then access the application by navigating to the http://localhost:8501 URL in your web browser.

Running the project with cloud deployment

To run the project with cloud deployment, you'll need to create a Qdrant Cloud cluster (free!) and add the credentials to your .env file. Take the .env.example file as a template for your .env file.

Once you have your QDRANT_URL and QDRANT_API_KEY set, you can run the following command:

make insert-embeddings-qdrant-cloud

This command will run the ZenML pipeline and insert the embeddings into your Qdrant Cloud cluster. Once the embeddings are inserted, you cann check the vectors are populating your Qdrant Cloud cluster by navigating to Qdrant's UI.

To deploy the Streamlit app, we'll use Google Cloud Run. This will allow us to deploy the app in a fully managed way, but it will also require a Google Cloud Account and the Google Cloud CLI installed.

Authenticate with Google Cloud:

gcloud auth login

Set your Google Cloud project:

gcloud config set project <PROJECT_ID>

Add the necessary permissions:

gcloud services enable cloudbuild.googleapis.com
gcloud services enable run.googleapis.com
gcloud services enable artifactregistry.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com
gcloud services enable secretmanager.googleapis.com

Auth docker registry:

gcloud config set compute/region <LOCATION>
gcloud auth configure-docker <LOCATION>-docker.pkg.dev -q

Location is the region where you want your project to be deployed. In my case, I'm using 'europe-west1'.

Create the Docker repository:

gcloud artifacts repositories create vector-twin-app --repository-format=docker \
    --location=<LOCATION> --description="Docker repository for the Twin Celebrity App" \
    --project=<PROJECT_ID>

Create secrets for Cloud Run:

echo -n <QDRANT_URL> | gcloud secrets create QDRANT_URL \
    --replication-policy="automatic" \
    --data-file=-
    
echo -n <QDRANT_API_KEY> | gcloud secrets create QDRANT_API_KEY \
    --replication-policy="automatic" \
    --data-file=-

Add Cloud Run permissions to secrets:

gcloud projects add-iam-policy-binding vector-twin \
  --member="serviceAccount:$(gcloud projects describe $(gcloud config get-value project) --format="value(projectNumber)")[email protected]" \
  --role="roles/secretmanager.secretAccessor"

Finally, let's build the Cloud Run application:

gcloud run deploy vector-twin \
    --port=8501 \
    --image=<LOCATION>-docker.pkg.dev/<PROJECT_ID>/vector-twin-app/app \
    --allow-unauthenticated \
    --region=<LOCATION> \
    --platform=managed \
    --project=<PROJECT_ID> \
    --memory=2Gi \
    --update-secrets=QDRANT_API_KEY=QDRANT_API_KEY:latest,QDRANT_URL=QDRANT_URL:latest

(Bonus) cloudbuild.yaml file:

I've also created a cloudbuild.yaml file to automate the deployment process. You can find it in the root of the project. To run it, you can use the following command:

gcloud builds submit --region=<LOCATION>

And that's it! You can now access the application by navigating to your Google Cloud Project and clicking on the Cloud Run service.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
img		img
src		src
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twin Celebrity App

Find your Twin Celebrity in Vector Space

Table of Contents

Introduction

Project Design

Embedding Generation Pipeline

Streamlit UI

Vector Twin Package

Prerequisites

Installation

Configuration

Environment Variables

ZenML Setup

Running the project locally

Running the project with cloud deployment

License

Contributing

About

Releases

Packages

Languages

License

neural-maze/vector-twin

Folders and files

Latest commit

History

Repository files navigation

Twin Celebrity App

Find your Twin Celebrity in Vector Space

Table of Contents

Introduction

Project Design

Embedding Generation Pipeline

Streamlit UI

Vector Twin Package

Prerequisites

Installation

Configuration

Environment Variables

ZenML Setup

Running the project locally

Running the project with cloud deployment

License

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages