Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor manual pydantics for scanpy pl agents #255

Open
wants to merge 173 commits into
base: main
Choose a base branch
from

Conversation

mengerj
Copy link
Contributor

@mengerj mengerj commented Dec 12, 2024

Refactored pydantic class definitions in pl modules to match new structure.
#245

bastienchassagnol and others added 30 commits December 10, 2024 13:52
…on 2.2.8 to superior or equal to 2.2.8. Indeed, it appears that the grpcio 1.53.0 external dependency of pymilvus version 2.2.8 is not compatible with Windows OS 11 and Python version 2.12.3, whatever it is the wheel or source version. Running pytest does not yield any errors, beyond raising deprecated warnings
method. Currently scanpy is imported when ScanpyTLQueryBuilder.parametrise_query is called.
Only includes functions which dont start with "_"
Merge Dev/tl into main to avoid plenty of branches
* add scanpy_pl module with initial fields

* add mocked test for module

* add module to API agent __init__.py

* add benchmark case

* add conditional for module benchmark

* downgrade httpx due to conflict
0.28 removed the proxy keyword, but openai is not aware

* add back default `question_uuid` field into pydantic class

* add scatter pydantic class

* add sc.pl.pca

* add pca benchmark case

* distinguish web api and python api benchmark

* change case to scatter

* add tsne class

* add tsne case

* fix typing

* add generic formatter (#233)

* add formatter functions for REST and Python

* make discoverable on module level

* add required field

* test the formatting functions

* `scanpy` to `sc` to fit common usage

* adjust benchmark to use the formatter

---------

Co-authored-by: daniele-lucarelli <[email protected]>
* pushed starter anndata file

* removed the tester

* Aim of the anndata api module

* Draft of the AnnDataIOParameters

* added a prompt

* updated the prompt

* started to implement the AnndataIOQueryBuilder

* added test for anndata api

* pushed pydantic reader classes

* Updated the anndata tool with integrated test:
-> returns dict with method & args

Co-authored-by: Anis Ismail <[email protected]>

* added query builder

* added querybuilder for anndata and its test

* updated query builder

* added exclude none

* feat(BaseAPIModel): Add reusable base class for structured outputs
	•	Introduced BaseAPIModel, a reusable base class to streamline the creation of Pydantic models for structured outputs.
	•	The class includes:
	•	uuid: An optional field (str | None) for unique identification of model instances.
	•	method_name: A required field (str) to specify the associated function or method, ensuring consistency across models.
	•	Configured with arbitrary_types_allowed to support flexible extensions.
	•	Designed for use in structured output generation.

This addition lays the groundwork for standardized, maintainable, and consistent API models.

* update query builder to remove create_runnable

* Updated the pydatic classes with the BaseAPIModel

* Updated the system prompt in the runnable of the AnnDataIOQueryBuilder

* fix in import of pydanticparser

* added test for query builder parameterise_query

* removed comments + redundant script

---------

Co-authored-by: Anis Ismail <[email protected]>
Co-authored-by: Anis Ismail <[email protected]>
…tem prompt is updated for the anndata query
replace with any length type (...)
…c classes

adjusts the ABC, the individual legacy classes (builder and fetcher), and the tests
now has empty list in parameters
@mengerj
Copy link
Contributor Author

mengerj commented Jan 24, 2025

Still working on this. Haven't found much time this week but I hope to get it done before you want to merge the biohackathon branch into main.
Comments for me:

  • Manual pydantics work well and output from LLM is basically the parsed python call
  • Need to adjust automated approach to correctly handle defaults and add the list of required parameters

@mengerj mengerj marked this pull request as ready for review January 27, 2025 09:44
@mengerj
Copy link
Contributor Author

mengerj commented Jan 27, 2025

I tried to resolve conflicts with the biohackathon main branch and it should be fine. But there are still difference in how tools are created, for example between the AnnData Builder and the others. I tried to unify the automatic generation of pydantic classes with the manual creation. The automated approach now doesn't directly generate pydnatic classes, but creates two dictionaries with parameters and function descriptions. In the manual approach (see scanpy_pl agent) these dictionaries are defined manually. The tool creation is then handled by BaseTools derived classes.
The output of the LLM is basically already a correctly parametized function call, and I don't think the format_as_pyhon_call should be needed.
Sadly the automated approach still throws errors for some functions and I couldn't resolve this. Single functions can also be given to the method. Here is a short example usage:
from biochatter.api_agent.auto_module_agent import AutoModuleQueryBuilder import scanpy as sc conv = conversation() auto_query_builder = AutoModuleQueryBuilder(module = sc.pl.scatter) auto_query_builder.parameterise_query(question= "Please use a scatter plot to create a basic representation of my adata object", conversation=conv)

@slobentanzer
Copy link
Contributor

Hi @mengerj, thanks for the great work! We don't need to merge this before we merge the main biohackathon branch, it is fine as a standalone feature. We'd need to make it robust though.

The output of the LLM is basically already a correctly parametized function call

I would caution that this then is not compatible with the other solution. If we can, we should definitely find a common ground and consensus way of returning the parameterised call. Naively, I would say that returning the Pydantic class and then parsing independently is most flexible and does not cost anything. Your method is based on Pydantic, after all. Any reason why this is not possible?

@slobentanzer slobentanzer changed the base branch from biohackathon3 to main January 30, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants