Skip to content

Commit

Permalink
Merge pull request #102 from biocypher/benchmark
Browse files Browse the repository at this point in the history
Benchmark & RAG agent, architecture changes (potentially breaking → minor version increase)
  • Loading branch information
slobentanzer authored Jan 26, 2024
2 parents 2982143 + 4cbb3f0 commit 7a501c1
Show file tree
Hide file tree
Showing 64 changed files with 6,844 additions and 3,413 deletions.
48 changes: 0 additions & 48 deletions .devcontainer/devcontainer.json

This file was deleted.

20 changes: 0 additions & 20 deletions .devcontainer/post-install.sh

This file was deleted.

3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ __pycache__/
.idea/
*.env
volumes/
benchmark/results/*.csv
benchmark/encrypted_llm_test_data.json
site/
50 changes: 50 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
fail_fast: false
default_language_version:
python: python3
default_stages:
- commit
- push
minimum_pre_commit_version: 2.7.1
repos:
- repo: https://github.com/ambv/black
rev: 23.7.0
hooks:
- id: black
- repo: https://github.com/timothycrosley/isort
rev: 5.12.0
hooks:
- id: isort
additional_dependencies: [toml]
- repo: https://github.com/snok/pep585-upgrade
rev: v1.0
hooks:
- id: upgrade-type-hints
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-docstring-first
- id: end-of-file-fixer
- id: check-added-large-files
- id: mixed-line-ending
- id: trailing-whitespace
exclude: ^.bumpversion.cfg$
- id: check-merge-conflict
- id: check-case-conflict
- id: check-symlinks
- id: check-yaml
args: [--unsafe]
- id: check-ast
- id: fix-encoding-pragma
args: [--remove] # for Python3 codebase, it's not necessary
- id: requirements-txt-fixer
- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.10.0
hooks:
- id: python-no-eval
- id: python-use-type-annotations
- id: python-check-blanket-noqa
- id: rst-backticks
- id: rst-directive-colons
- id: rst-inline-touching-normal
16 changes: 7 additions & 9 deletions DEVELOPER.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ For ensuring code quality, the following tools are used:

- [black](https://black.readthedocs.io/en/stable/) for automated code formatting

<!-- - [pre-commit-hooks](https://github.com/pre-commit/pre-commit-hooks) for
- [pre-commit-hooks](https://github.com/pre-commit/pre-commit-hooks) for
ensuring some general rules

- [pep585-upgrade](https://github.com/snok/pep585-upgrade) for automatically
Expand All @@ -55,28 +55,26 @@ upgrading type hints to the new native types defined in PEP 585
- [pygrep-hooks](https://github.com/pre-commit/pygrep-hooks) for ensuring some
general naming rules -->

<!-- Pre-commit hooks are used to automatically run these tools before each commit.
Pre-commit hooks are used to automatically run these tools before each commit.
They are defined in [.pre-commit-config.yaml](./.pre-commit-config.yaml). To
install the hooks run `poetry run pre-commit install`. The hooks are then
executed before each commit. For running the hook for all project files (not
only the changed ones) run `poetry run pre-commit run --all-files`. -->

<!-- The project uses a [Sphinx](https://www.sphinx-doc.org/en/master/) autodoc
GitHub Actions workflow to generate the documentation. If you add new code,
The project uses [mkdocs-material](https://squidfunk.github.io/mkdocs-material/) within a GitHub Actions workflow to generate the documentation. If you add new code,
please make sure that it is documented accordingly and in a consistent manner
with the existing code base. The docstrings should follow the [Google style
guide](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
To check if the docs build successfully, you can build them locally by running
`make html` in the `docs` directory. -->

`mkdocs build` in the project root directory. To preview your changes run `mkdocs serve`.
`
<!-- TODO: doctest -->
<!-- When adding new code snippets to the documentation, make sure that they are
automatically tested with
[doctest](https://sphinx-tutorial.readthedocs.io/step-3/#testing-your-code);
this ensures that no outdated code snippets are part of the documentation. -->

Documentation currently lives in the repository's
[wiki](https://github.com/biocypher/biochatter/wiki). We will soon create a
Sphinx-based documentation site.
The documentation is hosted [here](https://biochatter.org/).


## Testing
Expand Down
35 changes: 13 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,25 +54,16 @@ by BioChatter can be seen in use in the
Check out [this repository](https://github.com/csbl-br/awesome-compbio-chatgpt)
for more info on computational biology usage of large language models.

# Dev Container

Due to some incompatibilities of `pymilvus` with Apple Silicon, we have created
a dev container for this project. To use it, you need to have Docker installed
on your machine. Then, you can run the devcontainer setup as recommended by
VSCode
[here](https://code.visualstudio.com/docs/remote/containers#_quick-start-open-an-existing-folder-in-a-container)
or using Docker directly.

The dev container expects an environment file (there are options, but the basic
one is `.devcontainer/local.env`) with the following variables:

```
OPENAI_API_KEY=(sk-...)
DOCKER_COMPOSE=true
DEVCONTAINER=true
```

To test vector database functionality, you also need to start a Milvus
standalone server. You can do this by running `docker-compose up` as described
[here](https://milvus.io/docs/install_standalone-docker.md) on the host machine
(not from inside the devcontainer).
## Developer notes

If you're on Apple Silicon, you may encounter issues with the `grpcio`
dependency (`grpc` library, which is used in `pymilvus`). If so, try to install
the binary from source after removing the installed package from the virtual
environment from
[here](https://stackoverflow.com/questions/72620996/apple-m1-symbol-not-found-cfrelease-while-running-python-app):

```bash
pip uninstall grpcio
export GRPC_PYTHON_LDFLAGS=" -framework CoreFoundation"
pip install grpcio==1.53.0 --no-binary :all:
```
1 change: 1 addition & 0 deletions benchmark/benchmark_datasets.csv
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470220/#appsec1,Opthamology relat
https://bioconductor.org/packages/release/bioc/html/GSEABenchmarkeR.html,The GSEABenchmarkeR package implements an extendable framework for reproducible evaluation of set- and network-based methods for enrichment analysis of gene expression data,GSEABenchmarkeR,R package,,https://pubmed.ncbi.nlm.nih.gov/32026945/,,included in ChatGSE arxiv draft-- not sure how helpful it will be in LLM benchmarking,
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153208/,cell type annotation with GPT-4 in single-cell RNA-seq analysis,Reference-free and cost-effective automated cell type annotation with GPT-4 in single-cell RNA-seq analysis,,,,2023,,
https://github.com/source-data/soda-data,text extraction from figure legends,SourceData,Unknown,Unknown,"Introduction: The scientific publishing landscape is expanding rapidly, creating challenges for researchers to stay up-to-date with the evolution of the literature. Natural Language Processing (NLP) has emerged as a potent approach to automating knowledge extraction from this vast amount of publications and preprints. Tasks such as Named-Entity Recognition (NER) and Named-Entity Linking (NEL), in conjunction with context-dependent semantic interpretation, offer promising and complementary approaches to extracting structured information and revealing key concepts. Results: We present the SourceData-NLP dataset produced through the routine curation of papers during the publication process. A unique feature of this dataset is its emphasis on the annotation of bioentities in figure legends. We annotate eight classes of biomedical entities (small molecules, gene products, subcellular components, cell lines, cell types, tissues, organisms, and diseases), their role in the experimental design, and the nature of the experimental method as an additional class. SourceData-NLP contains more than 620,000 annotated biomedical entities, curated from 18,689 figures in 3,223 papers in molecular and cell biology. We illustrate the dataset's usefulness by assessing BioLinkBERT and PubmedBERT, two transformers-based models, fine-tuned on the SourceData-NLP dataset for NER. We also introduce a novel context-dependent semantic task that infers whether an entity is the target of a controlled intervention or the object of measurement. Conclusions: SourceData-NLP's scale highlights the value of integrating curation into publishing. Models trained with SourceData-NLP will furthermore enable the development of tools able to extract causal hypotheses from the literature and assemble them into knowledge graphs.",2023,,
https://github.com/bigscience-workshop/biomedical,General Biomedical Dataset Library (126+ datasets included),BigBIO,Huggingface dataloaders (format depends on the dataset),,https://proceedings.neurips.cc/paper_files/paper/2022/file/a583d2197eafc4afdd41f5b8765555c5-Paper-Datasets_and_Benchmarks.pdf,2022,,
,,,,,,,,
105 changes: 105 additions & 0 deletions benchmark/benchmark_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
import pytest

import pandas as pd


def benchmark_already_executed(
model_name: str,
task: str,
subtask: str,
) -> bool:
"""
Checks if the benchmark task and subtask test case for the model_name have already
been executed.
Args:
task (str): The benchmark task, e.g. "biocypher_query_generation"
subtask (str): The benchmark subtask test case, e.g. "0_entities"
model_name (str): The model name, e.g. "gpt-3.5-turbo"
Returns:
bool: True if the benchmark task and subtask for the model_name has
already been run, False otherwise
"""
task_results = return_or_create_result_file(task)
task_results_subset = (task_results["model_name"] == model_name) & (
task_results["subtask"] == subtask
)
return task_results_subset.any()


def skip_if_already_run(
model_name: str,
task: str,
subtask: str,
) -> None:
"""Helper function to check if the test case is already executed.
Args:
model_name (str): The model name, e.g. "gpt-3.5-turbo"
result_files (dict[str, pd.DataFrame]): The result files
task (str): The benchmark task, e.g. "biocypher_query_generation"
subtask (str): The benchmark subtask test case, e.g. "0_single_word"
"""
if benchmark_already_executed(model_name, task, subtask):
pytest.skip(
f"benchmark {task}: {subtask} with {model_name} already executed"
)


def return_or_create_result_file(
task: str,
):
"""
Returns the result file for the task or creates it if it does not exist.
Args:
task (str): The benchmark task, e.g. "biocypher_query_generation"
Returns:
pd.DataFrame: The result file for the task
"""
file_path = get_result_file_path(task)
try:
results = pd.read_csv(file_path, header=0)
except (pd.errors.EmptyDataError, FileNotFoundError):
results = pd.DataFrame(
columns=["model_name", "subtask", "score", "iterations"]
)
results.to_csv(file_path, index=False)
return results


def write_results_to_file(
model_name: str, subtask: str, score: str, iterations: str, file_path: str
):
"""Writes the benchmark results for the subtask to the result file.
Args:
model_name (str): The model name, e.g. "gpt-3.5-turbo"
subtask (str): The benchmark subtask test case, e.g. "entities_0"
score (str): The benchmark score, e.g. "1/1"
iterations (str): The number of iterations, e.g. "1"
"""
results = pd.read_csv(file_path, header=0)
new_row = pd.DataFrame(
[[model_name, subtask, score, iterations]], columns=results.columns
)
results = pd.concat([results, new_row], ignore_index=True).sort_values(
by=["model_name", "subtask"]
)
results.to_csv(file_path, index=False)


# TODO should we use SQLite? An online database (REDIS)?
def get_result_file_path(file_name: str) -> str:
"""Returns the path to the result file.
Args:
file_name (str): The name of the result file
Returns:
str: The path to the result file
"""
return f"benchmark/results/{file_name}.csv"
Loading

0 comments on commit 7a501c1

Please sign in to comment.