gpt-data-core

When we start a new GPT project and we need to create a new Redis DB and populate it with data - we need to follow these steps:

Copy locally an existing gpt-data project. For example gpt-data-kvindekroppen
Rename it - so it has the name of the new project you are creating. For example gpt-data-infare
Delete the data folder with its content and create an empty data folder in the root of the project
Delete the data-temp folder with its content and create an empty data-temp folder in the root of the project
Delete the venv folder(if there is one)
Go to the folder on your computer and allow the hidden files to be seen. Delete the .git folder. This will remove the link between the old github respository and the project.
Go to Abtions github account and click Create New Repository
Add name
Leave all the rest as it is and click Create repository
Go to Redis online, login with Abtions account and create a new DB
Go back to VSCode where you have the new gpt-data project created and update the .env file with the new Redis credentials
Follow the steps below, to create embeddings and ingest them in the redis DB

Environment variables

Ensure your .env variables are up to date with at least:

REDIS_HOST
REDIS_PORT
REDIS_PASSWORD
OPENAI_API_KEY

Use a python virtual environment (venv)

Create a venv folder, if there is no venv folder in the root directory of the project. If you have the venv folder, you do not need to create one. OBS: If you get an error in one of the following steps, delete the venv folder and create a new one with the command below:

python -m venv venv

Activate venv whenever working on this project. OBS: two different commands below, depending on your operating system:

Windows:

. venv/scripts/activate

Mac:

. venv/bin/activate

Your terminal will display the name of your venv when active: (venv).

Installing the required packages:

pip install -r requirements.txt

If any packages are installed/updated while developing, remember to freeze the package list:

pip freeze > requirements.txt

If you move to another project, close the terminal to deactive the current venv.

Installing / updating the core package

pip install git+https://github.com/abtion/gpt-data-core.git

Example use of gpt-data-core

# token_division_list.py
from gpt_data_core import token_division_list

division_list = token_division_list.TokenDivisionList(
  model="gpt-3.5-turbo-16k",
  max_tokens=8191
)
division_list.process_files("data")

# create_embeddings.py
from gpt_data_core import embedding_generator, config

openAIConfig = config.Config()

generator = embedding_generator.EmbeddingGenerator(
    openAIConfig.OPENAI_API_KEY,
    openAIConfig.DEFAULT_DATA_PATH,
    openAIConfig.DEFAULT_TEMP_PATH
)

generator.process_file("path/to/file")
# generator.process_all_files()

# ingest_embeddings.py
from gpt_data_core import embedding_ingestor, config, base_schema, redis_client

openAIConfig = config.Config()

def process_file(pipe, embedding_path, data_path, ingestor: embedding_ingestor.EmbeddingIngestor):
    embedding = ingestor.read_json_embedding(embedding_path)
    data = None
    with open(data_path, "r", encoding="utf8") as f:
        data = f.read()

    ingestor.insert_embedding(
        pipe,
        os.path.basename(data_path),
        data,
        embedding,
    )

redisClient = redis_client.RedisClient(
    openAIConfig.REDIS_HOST,
    openAIConfig.REDIS_PORT,
    openAIConfig.REDIS_PASSWORD)

ingestor = embedding_ingestor.EmbeddingIngestor(
    redisClient,
    openAIConfig.VECTOR_DIMENSIONS,
    openAIConfig.INDEX_NAME,
    openAIConfig.DOC_PREFIX,
    openAIConfig.DEFAULT_DATA_PATH,
    openAIConfig.DEFAULT_TEMP_PATH
)

schema = base_schema.create_base_schema(openAIConfig.VECTOR_DIMENSIONS)
ingestor.create_index(schema)

pipe = ingestor.redis_client.pipeline()

embeddings_and_data_list = ingestor.collect_embedding_and_data_paths()
for embeddings_and_data in embeddings_and_data_list:
    process_file(
        pipe, embeddings_and_data[0], embeddings_and_data[1], ingestor)

pipe.execute()

Additional Redis DB Fields

Sometimes we want to add additional fields to the Database which our chat application needs.

Ensure you configure the Redis schema with the correct type of field, and populate the value when inserting the embedding:

# ingest_embeddings.py
from gpt_data_core import ..., base_schema
from redis.commands.search.field import TagField

def process_file(...):
    ...
    newFieldValue = "some-value"
    ingestor.insert_embedding(
        ...
        extraMapping={"newfield": newFieldValue}
    )

schema = base_schema.create_base_schema(openAIConfig.VECTOR_DIMENSIONS)
abtionschema = schema + (TagField("newfield"),)

...

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src/gpt_data_core		src/gpt_data_core
.gitignore		.gitignore
.tool-versions		.tool-versions
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpt-data-core

Environment variables

Use a python virtual environment (venv)

Installing / updating the core package

Example use of gpt-data-core

Additional Redis DB Fields

About

Releases

Packages

Contributors 3

Languages

License

abtion/gpt-data-core

Folders and files

Latest commit

History

Repository files navigation

gpt-data-core

Environment variables

Use a python virtual environment (venv)

Installing / updating the core package

Example use of gpt-data-core

Additional Redis DB Fields

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages