biocypher · slobentanzer · Nov 23, 2023 · Oct 13, 2023 · Oct 24, 2023 · Oct 24, 2023
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -1,6 +1,6 @@
 name: biochatter Continuous Integration
 
-on: [pull_request, push]
+on: [push]
 
 jobs:
   test:
@@ -31,7 +31,7 @@ jobs:
       - name: Install dependencies
         run: |
           poetry install
-          poetry install -E podcast
+          poetry install -E 'podcast xinference'
 
       - name: Start Milvus server
         run: |
@@ -57,4 +57,4 @@ jobs:
 
       - name: Run tests
         run: |
-          poetry run pytest --ignore=./volumes
+          poetry run pytest test --ignore=./volumes
diff --git a/.gitignore b/.gitignore
@@ -7,6 +7,8 @@ __pycache__/
 .env
 *.mp3
 .cache
+*.env.idea
+.idea/
 *.env
 volumes/
-benchmark/results/*.csv
+benchmark/results/*.csv
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,124 @@
+
+# Code of Conduct
+
+Adapted from the [Contributor Covenant][homepage].
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone. We pledge to act and
+interact in ways that contribute to an open, welcoming, diverse, inclusive, and
+healthy community.
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the overall
+  community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or advances of
+  any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email address,
+  without their explicit permission (publicly available information is exempt)
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series of
+actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or permanent
+ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior, harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within the
+community.
+
+## Attribution
+
+This version of the basic Code of Conduct is available at
+[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
+
+Community Impact Guidelines were inspired by
+[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+
+For answers to common questions about this code of conduct, see the FAQ at
+[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
+[https://www.contributor-covenant.org/translations][translations].
+
+[homepage]: https://www.contributor-covenant.org
+[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
+[Mozilla CoC]: https://github.com/mozilla/diversity
+[FAQ]: https://www.contributor-covenant.org/faq
+[translations]: https://www.contributor-covenant.org/translations
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,13 @@
+# Contributing
+
+We are very happy about all kinds of contributions. Thanks for considering to
+help us out! The simplest way of contributing is to give feedback on the
+project, either via creating an issue here on GitHub or by sending a message to
+a maintainer. You can also report bugs that you may have found in this manner.
+
+If you want to do more, we are also happy about pull requests. Please make sure
+that you have read the [Developer Guide](DEVELOPER.md) before you start working
+on a pull request.
+
+Before joining the community, please also make sure that you agree with our
+[Code of Conduct](CODE_OF_CONDUCT.md).
diff --git a/DEVELOPER.md b/DEVELOPER.md
@@ -0,0 +1,126 @@
+# 🔬 Developer Guide
+
+Thank you for considering to contribute to the project! This guide will help you
+to get started with the development of the project. If you have any questions,
+please feel free to ask them in the issue tracker.
+
+## Dependency management
+
+We use [Poetry](https://python-poetry.org) for dependency management. Please
+make sure that you have installed Poetry and set up the environment correctly
+before starting development.
+
+### Setup the environment
+
+- Install dependencies from the lock file: `poetry install`
+
+- Select extras for the functions you want to use: `poetry install -E <extras>`
+
+- Use the environment: You can either run commands directly with `poetry run
+<command>` or open a shell with `poetry shell` and then run commands directly.
+
+### Updating the environment
+
+If you want to fix dependency issues, please do so in the Poetry
+framework. If Poetry does not work for you for some reason, please let us know.
+
+The Poetry dependencies are organized in groups. There are groups with
+dependencies needed for running BioChatter (`[tool.poetry.dependencies` with the
+group name `main`) and a group with dependencies needed for development
+(`[tool.poetry.group.dev.dependencies` with the group name `dev`). There are
+also extras (groups of optional dependencies) for functions that you may not
+want to install.
+
+For adding new dependencies:
+
+- Add new dependencies: `poetry add <dependency> -- group <group>`
+
+- Update lock file (after adding new dependencies in pyproject.toml): `poetry
+lock`
+
+## Code quality and formal requirements
+
+For ensuring code quality, the following tools are used:
+
+- [isort](https://isort.readthedocs.io/en/latest/) for sorting imports
+
+- [black](https://black.readthedocs.io/en/stable/) for automated code formatting
+
+<!-- - [pre-commit-hooks](https://github.com/pre-commit/pre-commit-hooks) for
+ensuring some general rules
+
+- [pep585-upgrade](https://github.com/snok/pep585-upgrade) for automatically
+upgrading type hints to the new native types defined in PEP 585
+
+- [pygrep-hooks](https://github.com/pre-commit/pygrep-hooks) for ensuring some
+general naming rules -->
+
+<!-- Pre-commit hooks are used to automatically run these tools before each commit.
+They are defined in [.pre-commit-config.yaml](./.pre-commit-config.yaml). To
+install the hooks run `poetry run pre-commit install`. The hooks are then
+executed before each commit. For running the hook for all project files (not
+only the changed ones) run `poetry run pre-commit run --all-files`. -->
+
+<!-- The project uses a [Sphinx](https://www.sphinx-doc.org/en/master/) autodoc
+GitHub Actions workflow to generate the documentation. If you add new code,
+please make sure that it is documented accordingly and in a consistent manner
+with the existing code base. The docstrings should follow the [Google style
+guide](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
+To check if the docs build successfully, you can build them locally by running
+`make html` in the `docs` directory. -->
+
+<!-- When adding new code snippets to the documentation, make sure that they are
+automatically tested with
+[doctest](https://sphinx-tutorial.readthedocs.io/step-3/#testing-your-code);
+this ensures that no outdated code snippets are part of the documentation. -->
+
+Documentation currently lives in the repository's
+[wiki](https://github.com/biocypher/biochatter/wiki). We will soon create a
+Sphinx-based documentation site.
+
+
+## Testing
+
+The project uses [pytest](https://docs.pytest.org/en/stable/) for testing. To
+run the tests, please run `pytest test` in the root directory of the project.
+The addition of `test` (the directory) is required since we are also using
+pytest for the benchmarking part, which can be invoked by running `pytest
+benchmark`. We are developing BioChatter using test-driven development. Please
+make sure that you add tests for your code before submitting a pull request.
+
+The existing tests can also help you to understand how the code works. If you
+have any questions, please feel free to ask them in the issue tracker.
+
+**Before submitting a pull request, please make sure that all tests pass and
+that the documentation builds correctly.**
+
+## Small Contributions
+
+If you want to contribute a small change (e.g. a bugfix), you can probably
+immediately go ahead and create a pull request. For more substantial changes or
+additions, please read on.
+
+## Larger Contributions
+
+If you want to contribute a larger change, please create an issue first. This
+will allow us to discuss the change and make sure that it fits into the project.
+It can happen that development for a feature is already in progress, so it is
+important to check first to avoid duplicate work. If you have any questions,
+feel free to approach us in any way you like.
+
+## Versioning
+
+We use [semantic versioning](https://semver.org/) for the project. This means
+that the version number is incremented according to the following scheme:
+
+- Increment the major version number if you make incompatible API changes.
+
+- Increment the minor version number if you add functionality in a backwards-
+  compatible manner.
+
+- Increment the patch version number if you make backwards-compatible bug fixes.
+
+We use the `bumpversion` tool to update the version number in the
+`pyproject.toml` file. This will create a new git tag automatically. Usually,
+versioning is done by the maintainers, so please do not increment versions in
+pull requests by default.
diff --git a/benchmark/test_vectorstore.py b/benchmark/test_vectorstore.py
@@ -26,7 +26,7 @@
 
 @pytest.mark.parametrize("model", EMBEDDING_MODELS)
 @pytest.mark.parametrize("chunk_size", CHUNK_SIZES)
-def test_document_summariser(model, chunk_size):
+def test_retrieval_augmented_generation(model, chunk_size):
     pdf_path = "test/bc_summary.pdf"
     with open(pdf_path, "rb") as f:
         doc_bytes = f.read()
@@ -35,16 +35,16 @@ def test_document_summariser(model, chunk_size):
     doc = reader.document_from_pdf(doc_bytes)
 
     doc_ids = []
-    docsum = DocumentEmbedder(model=model, chunk_size=chunk_size)
-    docsum.connect(_HOST, _PORT)
-    doc_ids.append(docsum.save_document(doc))
+    rag_agent = DocumentEmbedder(model=model, chunk_size=chunk_size)
+    rag_agent.connect(_HOST, _PORT)
+    doc_ids.append(rag_agent.save_document(doc))
 
     query = "What is BioCypher?"
-    results = docsum.similarity_search(query)
+    results = rag_agent.similarity_search(query)
     correct = ["BioCypher" in result.page_content for result in results]
 
     # delete embeddings
-    [docsum.database_host.remove_document(doc_id) for doc_id in doc_ids]
+    [rag_agent.database_host.remove_document(doc_id) for doc_id in doc_ids]
 
     # record sum in CSV file
     with open(FILE_PATH, "a") as f: