Data model refactor, enabling ontology support (#58)

* Refactored dm, added ontology terms * added example and generated lib * refactored dm * added dependencies * updated data model * enabled api generation * . * API update * added correct branch * . * corrected brach * API update * regenereated lib with ontology support from pyeed==0.2.2 * API update * Squashed commit of the following: commit 3b5495c Author: Niklas Abraham <[email protected]> Date: Fri Apr 19 13:47:02 2024 +0200 test commit 7e141b5 Author: Niklas Abraham <[email protected]> Date: Tue Apr 16 09:55:53 2024 +0200 added packages and cytoscape kinda works commit eab8b01 Author: Niklas Abraham <[email protected]> Date: Sun Apr 14 22:20:43 2024 +0200 kina works but aint pretty commit 6e0d42a Author: Niklas Abraham <[email protected]> Date: Sat Apr 13 12:11:15 2024 +0200 adding edges from list commit 4768b5c Author: Niklas Abraham <[email protected]> Date: Fri Apr 12 13:10:20 2024 +0200 added average commit 0a2bdf7 Author: Niklas Abraham <[email protected]> Date: Tue Apr 9 12:59:37 2024 +0200 graphs created within the network.py commit d2837a5 Author: Niklas Abraham <[email protected]> Date: Tue Apr 9 10:45:13 2024 +0200 added fields commit 67613db Author: Niklas Abraham <[email protected]> Date: Thu Apr 4 15:01:25 2024 +0200 working on implementing cytoscope in the network, starting with dataframe of alignements * API update * formatted imports * API update * removed unused * specified correct dir for api generation * API update * changed execution event * deleted unused * API update * added correct path * removed wrong lib * API update * 62 refactor mappers of fetcher modules (#66) * API update * removed old lib imports * changed type if taxonomy_id to int * removed deprecated imports * renamed to --> ontology * added ontology resources * started refactoring uniprot mapper * API update --------- Co-authored-by: sdRDM Bot <[email protected]> * API update * Refactor `ProteinRecord` #55 (#67) * okay import fixed and fixed proteinRecord Test * working on fixing the fetcher not working but fixed al lot of wrong refences * okay fixeed protein record and test * API update * ruff is happy now with the test * API update * all work with simple test * API update * fix circular import * API update * move `ProteinRecord` into functions * move `ProteinRecord` into functions * added test to detect circular imports * API update * enabled tests upon PR * API update * use `psycopg2-binary` to prevent missing bins * API update * API update * all in working order * working on the pairwise alignment structres, sigle should work, not using the Aglinment object * aligment seems to start working, first simple tests are running * working aligner tests, all inorder * at least working import all functinaly broken in network * changed aligner and changed the test the id is now inculded, better for overview and better for network * API update * all with general network run fine * added good alignment test with real data * fixed potenial error with diffrence in parralel naming seq1 or se2 and floating error * simple cytoscope graph cretaed and works * running network create and working threshhold * added degree calculations, node size, node color * API update --------- Co-authored-by: Niklas Abraham <[email protected]> Co-authored-by: sdRDM Bot <[email protected]> Co-authored-by: Jan Range <[email protected]> * API update * Refactor sequence alignment #63 (#70) * okay import fixed and fixed proteinRecord Test * working on fixing the fetcher not working but fixed al lot of wrong refences * okay fixeed protein record and test * API update * ruff is happy now with the test * API update * all work with simple test * API update * fix circular import * API update * move `ProteinRecord` into functions * move `ProteinRecord` into functions * added test to detect circular imports * API update * enabled tests upon PR * API update * use `psycopg2-binary` to prevent missing bins * API update * API update * API update * tests for alignment * API update * fix circular import * remove literals * API update --------- Co-authored-by: Niklas Abraham <[email protected]> Co-authored-by: sdRDM Bot <[email protected]> Co-authored-by: Jan Range <[email protected]> Co-authored-by: Max Häußler <[email protected]> Co-authored-by: max <[email protected]> * API update * Add examples (#72) * fixed mapping of proteinmapper * API update --------- Co-authored-by: sdRDM Bot <[email protected]> * API update * added new wokring function, inorder to bypass BUG in filter base_url * API update * added packages * API update * added base_url_fixes * API update * Fix basics (#73) * refactored pairwise aligner * API update * renaimed pairwise_aligner in pairwise * tested multipairwise aligner * added docstring * API update --------- Co-authored-by: sdRDM Bot <[email protected]> * API update * API update * fixed test runs, some still missing * API update * all update * API update * add new fixes * API update * added new lyout * API update * Update network.py * API update * Refactor alignment notebook to use updated pyeed API and handle API request errorsated example * updated example * cleaned * formatted * implemented abstract tool with clustalo * added docstr * removed deprecated * removed * added tofasta * API update * Clustal api (#75) * Refactor alignment notebook to use updated pyeed API and handle API request errorsated example * updated example * cleaned * formatted * implemented abstract tool with clustalo * added docstr * removed deprecated * removed * added tofasta * API update --------- Co-authored-by: sdRDM Bot <[email protected]> * API update * removed deprecated * cleaing * API update * API update * fixed service link * API update * updated docs * API update * added docs --------- Co-authored-by: sdRDM Bot <[email protected]> Co-authored-by: Niklas Abraham <[email protected]> Co-authored-by: Jan Range <[email protected]> Co-authored-by: Alina Lacheim <[email protected]> Co-authored-by: NiklasAbraham <[email protected]>
PyEED · May 13, 2024 · 5a43f73 · 5a43f73
1 parent 7f18b42
commit 5a43f73
Show file tree

Hide file tree

Showing 85 changed files with 3,168 additions and 26,511 deletions.
diff --git a/.github/scripts/example.py b/.github/scripts/example.py
@@ -1,7 +1,8 @@
 import toml
-from pyEED.core.proteininfo import ProteinInfo
 from sdrdm_database import DBConnector
 
+from pyEED.core.proteininfo import ProteinInfo
+
 # Get the protein sequence from NCBI
 aldolase = ProteinInfo.from_ncbi("NP_001287541.1")
 

diff --git a/.github/workflows/generate_api.yaml b/.github/workflows/generate_api.yaml
@@ -10,3 +10,6 @@ jobs:
         uses: JR-1991/generate-sdrdm-api@main
         with:
           library_name: pyeed
+          schema_path: "./specifications/sequence_record.md"
+          out_dir: "./"
+          branch: "63-terms-and-external-model-handling"
diff --git a/.github/workflows/release_pypi.yaml b/.github/workflows/release_pypi.yaml
@@ -1,6 +1,6 @@
 name: Release Pipeline
 
-on: push
+on: [release]
 
 jobs:
   build:

diff --git a/.github/workflows/tests.yaml b/.github/workflows/tests.yaml
@@ -1,6 +1,6 @@
 name: Tests
 
-on: push
+on: pull_request
 
 jobs:
   build:
@@ -27,4 +27,4 @@ jobs:
 
       - name: Run tests with pytest
         run: |
-          poetry run pytest
+          poetry run pytest
diff --git a/.gitignore b/.gitignore
@@ -99,7 +99,7 @@ ipython_config.py
 #   This is especially recommended for binary packages to ensure reproducibility, and is more
 #   commonly ignored for libraries.
 #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
+
 
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 __pypackages__/
@@ -152,4 +152,10 @@ cython_debug/
 #.idea/
 
 examples/data/*
-examples/clustering/data
+examples/clustering/data
+
+pyrightconfig.json
+
+poetry.lock
+
+.ruff_cache
diff --git a/docs/index.md b/docs/index.md
@@ -6,11 +6,13 @@
 
 ## What is PyEED?
 
-PyEED is a Python toolkit, that allows easy creation, annotation, and analysis of custom sequence data. All functionalities are based on a data model, which integrates all information on a given nucleotide or protein sequence in a single object. This allows the bundling of all information on a given sequence, making it available in all creation, annotation, and analysis steps. The entire system is generic and is also capable of modeling different scenarios.
+`pyeed` is a Python toolkit, that allows easy creation, annotation, and analysis of sequence data. All functionalities are based on a data model, which integrates all information on a given nucleotide or protein sequence in a single object. This allows the bundling of all information on a given sequence, making it available in all creation, annotation, and analysis steps. The entire system is generic and applies to various research scenarios.  
+`pyeed` is designed to enable object-oriented programming for bioinformatics. 
 
-## PyEED data structure
+## 📝 Data Structure
 
-The object structure of PyEED is based on a [data model](https://github.com/PyEED/pyeed/blob/main/specifications/data_model.md)(1), describing the relation between all attributes of a sequence. These attributes include the sequence, the organism, and annotations of the sequence. Furthermore, the information is marked with annotations, marking the origin of the information. 
-{ .annotate }
+The data structure of `pyeed` is based on a [data model](https://github.com/PyEED/pyeed/blob/main/specifications/data_model.md)(1), describing the relation between all attributes of a sequence. These attributes include the sequence, the organism, and annotations of the sequence such as sites and regions within the sequence. Furthermore, the information is marked with annotations, marking the origin of the information. 
 
-1.   PyEED uses the [sdRDM framework](https://github.com/JR-1991/software-driven-rdm) to define the architecture of its data as a Markdown document. The hierarchical structure defined in the Markdown document is used to generate Python classes, mirroring the structure of the data model. PyEED can thus be used to read and write data from SQL databases and apply its tools to the data.
+## 🛠️ Tools
+
+`pyeed` implements common tools for clustering, aligning, and visualizing sequences. CLI tools such as `Clustal Omega` are implemented as a Docker Service, allowing easy installation and usage of these tools.
diff --git a/docs/installation/docker.md b/docs/installation/docker.md
@@ -2,9 +2,60 @@
 icon: simple/docker
 ---
 
-Docker is a way to run software in a container. This means that the software is isolated from the rest of your system. This simplifies the installation of computing environments since everything is preconfigured in the container. Docker is comparable to running a virtual machine, but instead of installing a whole operating system, it only installs the software you need.
+## Concept
 
+The PyEED Docker Service combines the PyEED toolkit with JupyterLab, a web-based editor for writing and executing Jupyter Notebooks with analysis tools. In combination, the `pyeed` package can be used from inside JupyterLab, whereas the setup and configuration of the analysis tools are taken care of by the Docker Service(1).  
+In this way, the Docker Service allows you to run JupyterLab on your local machine without having to install Python, Jupyter, and the necessary Python tools to work with your data.
+{ .annotate }
 
-Here we will use Docker to install JupyterLab, a popular web-based environment for Jupyter notebooks, code, and data. This will allow you to run JupyterLab on your local machine without having to install Python, Jupyter, and the necessary Python tools to work with your data and EnzymeML.
+1.  Docker is a way to run software in a container. This means that the software is isolated from the rest of your system. This simplifies the installation of computing environments since everything is preconfigured in the container. Docker is comparable to running a virtual machine, but instead of installing a whole operating system, it only installs the software you need.
 
-To install Docker, follow the instructions for your operating system on the [Docker website](https://docs.docker.com/get-docker/).
+To install Docker, follow the instructions for your operating system on the [Docker website](https://docs.docker.com/get-docker/).
+
+## Initial Setup
+
+1. **Install Docker**: Follow the instructions for your operating system on the [Docker website](https://docs.docker.com/get-docker/).
+
+2. **[Download](https://github.com/PyEED/pyeed/archive/refs/heads/main.zip
+) the PyEED Docker Service**
+
+3. **Start the Service** by running the following steps:
+=== "Windows"
+
+    1. Open the command line by pressing ++windows+r++ and type `powershell`.
+
+    2. Navigate to the Downloads folder and unzip the downloaded file.
+
+    3. Navigate to the unzipped folder by running the following command, adjust the path if necessary:
+        ```powershell
+        cd ~\Downloads\pyeed-main
+        ```
+    4. Start the Docker Service by running the following command:
+        ```powershell
+        docker compose up --build
+        ```
+
+=== "MacOS/Linux"
+
+    1. Open the terminal
+    2. Navigate to the Downloads folder and unzip the downloaded file.
+        ```bash
+        cd ~/Downloads
+        unzip pyeed-main.zip
+        ```
+    3. Navigate to the unzipped folder by running the following command, adjust the path if necessary:
+        ```bash
+        cd ~/Downloads/pyeed-main
+        ```
+    4. Start the Docker Service by running the following command:
+        ```bash
+        docker compose up --build
+        ```
+
+## Start the PyEED Docker Service
+
+After the initial setup, all containers belonging to the PyEED Docker Service are created and started. The service is now added to the `Containers` section in the Docker Desktop application. To start the service, click on the :material-play: button next to the `pyeed` container. To access the JupyterLab environment, click on the link `8888:8888` in the header of the container. This will open a new tab in your browser, showing the JupyterLab environment.
+
+## Stopping the PyEED Docker Service
+
+To stop the service, navigate to the `Containers` section in the Docker Desktop application and click on the :material-stop: button next to the `pyeed` container. Running containers are symbolized by a green container icon. You can close the browser window whenever you want. The container will keep running in the background unless you stop it in the Docker Desktop app.
diff --git a/docs/usecases/usecase1.md b/docs/usecases/usecase1.md
@@ -1,5 +1,2 @@
-# Create a sequence network
+# Usecases
 
-Download and execute the [Notebook](https://github.com/PyEED/pyeed/blob/main/examples/basics.ipynb) open the notebook either in your Python environment, or follow the [instructions](../installation/jupyterlab.md) to set up the `PyEED-Lab` container as your computing environment including `pyeed`.
-
-When using the `PyEED-Lab` Docker container, navigate in the file browser on the left to the folder that contains the downloaded [notebook](https://github.com/PyEED/pyeed/blob/main/examples/basics.ipynb). Double-click on the notebook to open it in the JupyterLab environment. Then you can execute each cell by pressing ++shift+enter++ or by clicking the `Run` button in the toolbar.
diff --git a/examples/alignment.ipynb b/examples/alignment.ipynb