Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a command to generate missing mappings #254

Merged
merged 16 commits into from
Aug 7, 2024

Conversation

chesterbot01
Copy link
Contributor

@chesterbot01 chesterbot01 commented Jul 30, 2024

Added a command /generate_mappings trigger the generation of taxonomy mappings.
See the ticket linked to the PR for details of how it works.

To tophat, set ENV["OPENAI_API_KEY"] to a valid OpenAI API Token, and then run bin/generate_missing_mappings. We will see data/integrations/google/2021-09-21/mappings/from_shopify.yml is updated and tmp/mapping_update_message.txt is created.

image

@chesterbot01 chesterbot01 force-pushed the add-generate-mappings-command branch 4 times, most recently from a231d89 to 79a174f Compare July 30, 2024 22:06
@chesterbot01
Copy link
Contributor Author

chesterbot01 commented Jul 30, 2024

@elsom25 I am not able to push large files (over 5M) like embedding cache to the remote in this repo (but in other repos I did not encounter such error),

Enumerating objects: 22, done.
Counting objects: 100% (22/22), done.
Delta compression using up to 11 threads
Compressing objects: 100% (13/13), done.
error: RPC failed; HTTP 400 curl 22 The requested URL returned error: 400
send-pack: unexpected disconnect while reading sideband packet
Writing objects: 100% (14/14), 1.56 MiB | 4.05 MiB/s, done.
Total 14 (delta 5), reused 0 (delta 0), pack-reused 0
fatal: the remote end hung up unexpectedly

do you know how to avoid such error?

@chesterbot01 chesterbot01 force-pushed the add-generate-mappings-command branch 7 times, most recently from d80b334 to 96ef778 Compare July 30, 2024 23:40
Gemfile Outdated Show resolved Hide resolved
@elsom25
Copy link
Collaborator

elsom25 commented Jul 31, 2024

@chesterbot01 let's split thiss into two:

  1. First, add the command to run locally. then we can test/vet quality
  2. Once we're confident, we can hook it up to GH Actions

@chesterbot01 chesterbot01 force-pushed the add-generate-mappings-command branch 2 times, most recently from 660bbcd to b7a9fb3 Compare July 31, 2024 21:11
bin/generate_missing_mappings Outdated Show resolved Hide resolved
@chesterbot01 chesterbot01 force-pushed the add-generate-mappings-command branch 6 times, most recently from ab60513 to 0e0d342 Compare August 1, 2024 04:56
@chesterbot01 chesterbot01 marked this pull request as ready for review August 1, 2024 15:02
@chesterbot01
Copy link
Contributor Author

re #254 (review)
Done.

Comment on lines 352 to 347
# TODO: move to makefile; ruby should not need to think abuot service setup
def ensure_qdrant_server_running
qdrant_port = URI.parse(params[:qdrant_api_base]).port
return if system("lsof -i:#{qdrant_port}", out: "/dev/null")

command = "podman run -p #{qdrant_port}:#{qdrant_port} qdrant/qdrant"
pid = Process.spawn(command, out: "/dev/null", err: "/dev/null")
Process.detach(pid)
logger.info("Started Qdrant server in the background with PID #{pid}.")
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this part of the Makefile instead? Or what challenges do you forsee with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, done.

@@ -38,6 +38,13 @@ dist/{locale}/integrations/
### Mappings
The primary concern of integrations are mappings. Mappings are a set of rules that help us convert to and from the Shopify taxonomy and the taxonomy of that integration.

#### Use tooling to generate taxonomy mappings
Automate your taxonomy mapping with this tool. Make sure `podman` is installed using `homebrew`, then run `make generate_mappings OPENAI_API_BASE=<base URI> OPENAI_API_KEY=<API acceess token>` to create mappings. It uses AI to assist in generating accurate mappings and reducing manual effort.
Copy link
Collaborator

@elsom25 elsom25 Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we setup .env.development.local files to autoload things like this? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced dotenv to override local environment variables and updated README and .gitignore to reflect the change.

Comment on lines 45 to 46
#### Use tooling to generate taxonomy mappings
Automate your taxonomy mapping with this tool. Install dependencies using `make setup-mapping-generation-tool`, then run `bin/generate_missing_mappings` to create mappings. It uses AI to assist in generating accurate mappings and reducing manual effort.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is this was a rebase slip-in: I'm assuming 🔥 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Removed.

Copy link
Collaborator

@elsom25 elsom25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main things on my mind still, in importance:

  1. qdrant run via makefile vs in ruby-script — or at least a good reason to keep it as-is
  2. Any common practices/facilities around our prompt we should include with this change? Thinking about promptfoo and it's kin
  3. Let's add https://github.com/bkeepers/dotenv? We can keep .env in as a template (but empty values), with both .env.development.local and .env.test.local git-ignored. README should signal about that at minimum, and maybe even our Makefile/dev.yml warn/assist here?
  4. A wrapper in dev for generate_mappings like the other main commands.

@elsom25 elsom25 dismissed their stale review August 7, 2024 12:28

No longer blocking

@chesterbot01 chesterbot01 force-pushed the add-generate-mappings-command branch from 26cc6a4 to 3f16e3a Compare August 7, 2024 16:28
@chesterbot01 chesterbot01 force-pushed the add-generate-mappings-command branch from 7b79d37 to f31cf92 Compare August 7, 2024 20:16
@chesterbot01 chesterbot01 force-pushed the add-generate-mappings-command branch from f31cf92 to 54e0f17 Compare August 7, 2024 20:23
@chesterbot01
Copy link
Contributor Author

re

qdrant run via makefile vs in ruby-script — or at least a good reason to keep it as-is

moved the logic of ensure_qdrant_server_running to makefile

Any common practices/facilities around our prompt we should include with this change? Thinking about promptfoo and it's kin

The current prompt is the best we've tested during hackdays. We can set up promptfoo in the tools repo, request an expert-labeled golden dataset to run the benchmark, and select the best prompt to override the current one if necessary.

Let's add https://github.com/bkeepers/dotenv?

done.

A wrapper in dev for generate_mappings like the other main commands.

done.

@chesterbot01 chesterbot01 merged commit ecf1f2a into main Aug 7, 2024
8 checks passed
@chesterbot01 chesterbot01 deleted the add-generate-mappings-command branch August 7, 2024 20:53
@@ -22,6 +22,7 @@ gem "tty-option", "~> 0.3", require: false
# generate taxonomy mappings
gem "qdrant-ruby", require: "qdrant"
gem "ruby-openai"
gem 'dotenv', groups: [:development, :test]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the group below pls @chesterbot01

@@ -39,7 +39,15 @@ dist/{locale}/integrations/
The primary concern of integrations are mappings. Mappings are a set of rules that help us convert to and from the Shopify taxonomy and the taxonomy of that integration.

#### Use tooling to generate taxonomy mappings
Automate your taxonomy mapping with this tool. Make sure `podman` is installed using `homebrew`, then run `make generate_mappings OPENAI_API_BASE=<base URI> OPENAI_API_KEY=<API acceess token>` to create mappings. It uses AI to assist in generating accurate mappings and reducing manual effort.
Automate your taxonomy mapping with the CLI-based tooling. Make sure `podman` is installed using `homebrew`, set up local environment variables OPENAI_API_BASE and OPENAI_API_KEY in `.env.development.local`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shuold have a .env files that contains something like:

OPENAI_API_BASE=
OPENAI_API_KEY=

And then our instructions should be to cp/rename this file and fill it in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants