Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for python API calling and scverse packages #263

Merged
merged 88 commits into from
Jan 30, 2025

Conversation

slobentanzer
Copy link
Contributor

This PR is to extend API calling functionality towards Python functions, specifically from scverse packages. It is mainly derived from work done in the German BioHackathon 3, 2024.

Added:

  • concrete scverse module parameterisation classes
  • abstract API parameterisation by injection of a Python module and parsing
  • parsing of the structured LLM return into a callable Python function
  • benchmark to test LLM capabilities with regard to Python and scverse, specifically

Refactored:

  • API agent module now is structured into submodules
  • API agent documentation updated to reflect the changes

Further information can be found in the hackathon repo https://github.com/biocypher/scverse-x-biochatter and in a discussion https://github.com/orgs/biocypher/discussions/3

Not all features are fully functional yet, but the existing functionality integrates with the framework and does not harm existing processes. Recommending bump of minor.

This PR also already merged a fix proposed to improve the generic chat functionality handling and documentation, #262. Will merge simultaneously.

bastienchassagnol and others added 30 commits December 10, 2024 13:52
…on 2.2.8 to superior or equal to 2.2.8. Indeed, it appears that the grpcio 1.53.0 external dependency of pymilvus version 2.2.8 is not compatible with Windows OS 11 and Python version 2.12.3, whatever it is the wheel or source version. Running pytest does not yield any errors, beyond raising deprecated warnings
method. Currently scanpy is imported when ScanpyTLQueryBuilder.parametrise_query is called.
Only includes functions which dont start with "_"
Merge Dev/tl into main to avoid plenty of branches
* add scanpy_pl module with initial fields

* add mocked test for module

* add module to API agent __init__.py

* add benchmark case

* add conditional for module benchmark

* downgrade httpx due to conflict
0.28 removed the proxy keyword, but openai is not aware

* add back default `question_uuid` field into pydantic class

* add scatter pydantic class

* add sc.pl.pca

* add pca benchmark case

* distinguish web api and python api benchmark

* change case to scatter

* add tsne class

* add tsne case

* fix typing

* add generic formatter (#233)

* add formatter functions for REST and Python

* make discoverable on module level

* add required field

* test the formatting functions

* `scanpy` to `sc` to fit common usage

* adjust benchmark to use the formatter

---------

Co-authored-by: daniele-lucarelli <[email protected]>
* pushed starter anndata file

* removed the tester

* Aim of the anndata api module

* Draft of the AnnDataIOParameters

* added a prompt

* updated the prompt

* started to implement the AnndataIOQueryBuilder

* added test for anndata api

* pushed pydantic reader classes

* Updated the anndata tool with integrated test:
-> returns dict with method & args

Co-authored-by: Anis Ismail <[email protected]>

* added query builder

* added querybuilder for anndata and its test

* updated query builder

* added exclude none

* feat(BaseAPIModel): Add reusable base class for structured outputs
	•	Introduced BaseAPIModel, a reusable base class to streamline the creation of Pydantic models for structured outputs.
	•	The class includes:
	•	uuid: An optional field (str | None) for unique identification of model instances.
	•	method_name: A required field (str) to specify the associated function or method, ensuring consistency across models.
	•	Configured with arbitrary_types_allowed to support flexible extensions.
	•	Designed for use in structured output generation.

This addition lays the groundwork for standardized, maintainable, and consistent API models.

* update query builder to remove create_runnable

* Updated the pydatic classes with the BaseAPIModel

* Updated the system prompt in the runnable of the AnnDataIOQueryBuilder

* fix in import of pydanticparser

* added test for query builder parameterise_query

* removed comments + redundant script

---------

Co-authored-by: Anis Ismail <[email protected]>
Co-authored-by: Anis Ismail <[email protected]>
…tem prompt is updated for the anndata query
replace with any length type (...)
…c classes

adjusts the ABC, the individual legacy classes (builder and fetcher), and the tests
now has empty list in parameters
@mengerj
Copy link
Contributor

mengerj commented Jan 24, 2025

I will try to work on it on Monday. But I am also trying to get the automated approach working and finish the refactoring approach. It got a bit messy and I dont have a great overview of the package, so I hope I can get it done.

@slobentanzer
Copy link
Contributor Author

@mengerj thanks, any feedback is welcome; the merge is also not urgent. if anybody needs a bit more time, just let me know. I am targeting end of this week as the deadline for the merge.

MDLDan

This comment was marked as outdated.

@MDLDan
Copy link
Contributor

MDLDan commented Jan 27, 2025

@mengerj thanks, any feedback is welcome; the merge is also not urgent. if anybody needs a bit more time, just let me know. I am targeting end of this week as the deadline for the merge.

Looks good to me overall, just a bit weird we have pp and pl while tl is with the automated method

@slobentanzer
Copy link
Contributor Author

Looks good to me overall, just a bit weird we have pp and pl while tl is with the automated method

@MDLDan true, but I would see this as a proof of concept / prototype, not a feature that we will prominently advertise. The main reason for the merge is that PRs shouldn't hang around forever, and we only need to make sure that nothing breaks (which CI should take care of for us).

@slobentanzer slobentanzer merged commit 15782fb into main Jan 30, 2025
1 check passed
@slobentanzer slobentanzer deleted the biohackathon3 branch January 30, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants