Skip to content

Commit

Permalink
Data model refactor, enabling ontology support (#58)
Browse files Browse the repository at this point in the history
* Refactored dm, added ontology terms

* added example and generated lib

* refactored dm

* added dependencies

* updated data model

* enabled api generation

* .

* API update

* added correct branch

* .

* corrected brach

* API update

* regenereated lib with ontology support from pyeed==0.2.2

* API update

* Squashed commit of the following:

commit 3b5495c
Author: Niklas Abraham <[email protected]>
Date:   Fri Apr 19 13:47:02 2024 +0200

    test

commit 7e141b5
Author: Niklas Abraham <[email protected]>
Date:   Tue Apr 16 09:55:53 2024 +0200

    added packages and cytoscape kinda works

commit eab8b01
Author: Niklas Abraham <[email protected]>
Date:   Sun Apr 14 22:20:43 2024 +0200

    kina works but aint pretty

commit 6e0d42a
Author: Niklas Abraham <[email protected]>
Date:   Sat Apr 13 12:11:15 2024 +0200

    adding edges from list

commit 4768b5c
Author: Niklas Abraham <[email protected]>
Date:   Fri Apr 12 13:10:20 2024 +0200

    added average

commit 0a2bdf7
Author: Niklas Abraham <[email protected]>
Date:   Tue Apr 9 12:59:37 2024 +0200

    graphs created within the  network.py

commit d2837a5
Author: Niklas Abraham <[email protected]>
Date:   Tue Apr 9 10:45:13 2024 +0200

    added fields

commit 67613db
Author: Niklas Abraham <[email protected]>
Date:   Thu Apr 4 15:01:25 2024 +0200

    working on implementing cytoscope in the network, starting with dataframe of alignements

* API update

* formatted imports

* API update

* removed unused

* specified correct dir for api generation

* API update

* changed execution event

* deleted unused

* API update

* added correct path

* removed wrong lib

* API update

* 62 refactor mappers of fetcher modules (#66)

* API update

* removed old lib imports

* changed type if taxonomy_id to int

* removed deprecated imports

* renamed to --> ontology

* added ontology resources

* started refactoring uniprot mapper

* API update

---------

Co-authored-by: sdRDM Bot <[email protected]>

* API update

* Refactor `ProteinRecord` #55 (#67)

* okay import fixed and fixed proteinRecord Test

* working on fixing the fetcher not working but fixed al lot of wrong refences

* okay fixeed protein record and test

* API update

* ruff is happy now with the test

* API update

* all work with simple test

* API update

* fix circular import

* API update

* move `ProteinRecord` into functions

* move `ProteinRecord` into functions

* added test to detect circular imports

* API update

* enabled tests upon PR

* API update

* use `psycopg2-binary` to prevent missing bins

* API update

* API update

* all in working order

* working on the pairwise alignment structres, sigle should work, not using the Aglinment object

* aligment seems to start working, first simple tests are running

* working aligner tests, all inorder

* at least working import all functinaly broken in network

* changed aligner and changed the test the id is now inculded, better for overview and better for network

* API update

* all with general network run fine

* added good alignment test with real data

* fixed potenial error with diffrence in parralel naming seq1 or se2 and floating error

* simple cytoscope graph cretaed and works

* running network create and working threshhold

* added degree calculations, node size, node color

* API update

---------

Co-authored-by: Niklas Abraham <[email protected]>
Co-authored-by: sdRDM Bot <[email protected]>
Co-authored-by: Jan Range <[email protected]>

* API update

* Refactor sequence alignment #63 (#70)

* okay import fixed and fixed proteinRecord Test

* working on fixing the fetcher not working but fixed al lot of wrong refences

* okay fixeed protein record and test

* API update

* ruff is happy now with the test

* API update

* all work with simple test

* API update

* fix circular import

* API update

* move `ProteinRecord` into functions

* move `ProteinRecord` into functions

* added test to detect circular imports

* API update

* enabled tests upon PR

* API update

* use `psycopg2-binary` to prevent missing bins

* API update

* API update

* API update

* tests for alignment

* API update

* fix circular import

* remove literals

* API update

---------

Co-authored-by: Niklas Abraham <[email protected]>
Co-authored-by: sdRDM Bot <[email protected]>
Co-authored-by: Jan Range <[email protected]>
Co-authored-by: Max Häußler <[email protected]>
Co-authored-by: max <[email protected]>

* API update

* Add examples (#72)

* fixed mapping of proteinmapper

* API update

---------

Co-authored-by: sdRDM Bot <[email protected]>

* API update

* added new wokring function, inorder to bypass BUG in filter base_url

* API update

* added packages

* API update

* added base_url_fixes

* API update

* Fix basics (#73)

* refactored pairwise aligner

* API update

* renaimed pairwise_aligner in pairwise

* tested multipairwise aligner

* added docstring

* API update

---------

Co-authored-by: sdRDM Bot <[email protected]>

* API update

* API update

* fixed test runs, some still missing

* API update

* all update

* API update

* add new fixes

* API update

* added new lyout

* API update

* Update network.py

* API update

* Refactor alignment notebook to use updated pyeed API and handle API request errorsated example

* updated example

* cleaned

* formatted

* implemented abstract tool with clustalo

* added docstr

* removed deprecated

* removed

* added tofasta

* API update

* Clustal api (#75)

* Refactor alignment notebook to use updated pyeed API and handle API request errorsated example

* updated example

* cleaned

* formatted

* implemented abstract tool with clustalo

* added docstr

* removed deprecated

* removed

* added tofasta

* API update

---------

Co-authored-by: sdRDM Bot <[email protected]>

* API update

* removed deprecated

* cleaing

* API update

* API update

* fixed service link

* API update

* updated docs

* API update

* added docs

---------

Co-authored-by: sdRDM Bot <[email protected]>
Co-authored-by: Niklas Abraham <[email protected]>
Co-authored-by: Jan Range <[email protected]>
Co-authored-by: Alina Lacheim <[email protected]>
Co-authored-by: NiklasAbraham <[email protected]>
  • Loading branch information
6 people authored May 13, 2024
1 parent 7f18b42 commit 5a43f73
Show file tree
Hide file tree
Showing 85 changed files with 3,168 additions and 26,511 deletions.
3 changes: 2 additions & 1 deletion .github/scripts/example.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import toml
from pyEED.core.proteininfo import ProteinInfo
from sdrdm_database import DBConnector

from pyEED.core.proteininfo import ProteinInfo

# Get the protein sequence from NCBI
aldolase = ProteinInfo.from_ncbi("NP_001287541.1")

Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/generate_api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@ jobs:
uses: JR-1991/generate-sdrdm-api@main
with:
library_name: pyeed
schema_path: "./specifications/sequence_record.md"
out_dir: "./"
branch: "63-terms-and-external-model-handling"
2 changes: 1 addition & 1 deletion .github/workflows/release_pypi.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Release Pipeline

on: push
on: [release]

jobs:
build:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Tests

on: push
on: pull_request

jobs:
build:
Expand All @@ -27,4 +27,4 @@ jobs:
- name: Run tests with pytest
run: |
poetry run pytest
poetry run pytest
10 changes: 8 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ ipython_config.py
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock


# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
Expand Down Expand Up @@ -152,4 +152,10 @@ cython_debug/
#.idea/

examples/data/*
examples/clustering/data
examples/clustering/data

pyrightconfig.json

poetry.lock

.ruff_cache
12 changes: 7 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@

## What is PyEED?

PyEED is a Python toolkit, that allows easy creation, annotation, and analysis of custom sequence data. All functionalities are based on a data model, which integrates all information on a given nucleotide or protein sequence in a single object. This allows the bundling of all information on a given sequence, making it available in all creation, annotation, and analysis steps. The entire system is generic and is also capable of modeling different scenarios.
`pyeed` is a Python toolkit, that allows easy creation, annotation, and analysis of sequence data. All functionalities are based on a data model, which integrates all information on a given nucleotide or protein sequence in a single object. This allows the bundling of all information on a given sequence, making it available in all creation, annotation, and analysis steps. The entire system is generic and applies to various research scenarios.
`pyeed` is designed to enable object-oriented programming for bioinformatics.

## PyEED data structure
## 📝 Data Structure

The object structure of PyEED is based on a [data model](https://github.com/PyEED/pyeed/blob/main/specifications/data_model.md)(1), describing the relation between all attributes of a sequence. These attributes include the sequence, the organism, and annotations of the sequence. Furthermore, the information is marked with annotations, marking the origin of the information.
{ .annotate }
The data structure of `pyeed` is based on a [data model](https://github.com/PyEED/pyeed/blob/main/specifications/data_model.md)(1), describing the relation between all attributes of a sequence. These attributes include the sequence, the organism, and annotations of the sequence such as sites and regions within the sequence. Furthermore, the information is marked with annotations, marking the origin of the information.

1. PyEED uses the [sdRDM framework](https://github.com/JR-1991/software-driven-rdm) to define the architecture of its data as a Markdown document. The hierarchical structure defined in the Markdown document is used to generate Python classes, mirroring the structure of the data model. PyEED can thus be used to read and write data from SQL databases and apply its tools to the data.
## 🛠️ Tools

`pyeed` implements common tools for clustering, aligning, and visualizing sequences. CLI tools such as `Clustal Omega` are implemented as a Docker Service, allowing easy installation and usage of these tools.
57 changes: 54 additions & 3 deletions docs/installation/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,60 @@
icon: simple/docker
---

Docker is a way to run software in a container. This means that the software is isolated from the rest of your system. This simplifies the installation of computing environments since everything is preconfigured in the container. Docker is comparable to running a virtual machine, but instead of installing a whole operating system, it only installs the software you need.
## Concept

The PyEED Docker Service combines the PyEED toolkit with JupyterLab, a web-based editor for writing and executing Jupyter Notebooks with analysis tools. In combination, the `pyeed` package can be used from inside JupyterLab, whereas the setup and configuration of the analysis tools are taken care of by the Docker Service(1).
In this way, the Docker Service allows you to run JupyterLab on your local machine without having to install Python, Jupyter, and the necessary Python tools to work with your data.
{ .annotate }

Here we will use Docker to install JupyterLab, a popular web-based environment for Jupyter notebooks, code, and data. This will allow you to run JupyterLab on your local machine without having to install Python, Jupyter, and the necessary Python tools to work with your data and EnzymeML.
1. Docker is a way to run software in a container. This means that the software is isolated from the rest of your system. This simplifies the installation of computing environments since everything is preconfigured in the container. Docker is comparable to running a virtual machine, but instead of installing a whole operating system, it only installs the software you need.

To install Docker, follow the instructions for your operating system on the [Docker website](https://docs.docker.com/get-docker/).
To install Docker, follow the instructions for your operating system on the [Docker website](https://docs.docker.com/get-docker/).

## Initial Setup

1. **Install Docker**: Follow the instructions for your operating system on the [Docker website](https://docs.docker.com/get-docker/).

2. **[Download](https://github.com/PyEED/pyeed/archive/refs/heads/main.zip
) the PyEED Docker Service**

3. **Start the Service** by running the following steps:
=== "Windows"

1. Open the command line by pressing ++windows+r++ and type `powershell`.

2. Navigate to the Downloads folder and unzip the downloaded file.

3. Navigate to the unzipped folder by running the following command, adjust the path if necessary:
```powershell
cd ~\Downloads\pyeed-main
```
4. Start the Docker Service by running the following command:
```powershell
docker compose up --build
```

=== "MacOS/Linux"

1. Open the terminal
2. Navigate to the Downloads folder and unzip the downloaded file.
```bash
cd ~/Downloads
unzip pyeed-main.zip
```
3. Navigate to the unzipped folder by running the following command, adjust the path if necessary:
```bash
cd ~/Downloads/pyeed-main
```
4. Start the Docker Service by running the following command:
```bash
docker compose up --build
```

## Start the PyEED Docker Service

After the initial setup, all containers belonging to the PyEED Docker Service are created and started. The service is now added to the `Containers` section in the Docker Desktop application. To start the service, click on the :material-play: button next to the `pyeed` container. To access the JupyterLab environment, click on the link `8888:8888` in the header of the container. This will open a new tab in your browser, showing the JupyterLab environment.

## Stopping the PyEED Docker Service

To stop the service, navigate to the `Containers` section in the Docker Desktop application and click on the :material-stop: button next to the `pyeed` container. Running containers are symbolized by a green container icon. You can close the browser window whenever you want. The container will keep running in the background unless you stop it in the Docker Desktop app.
5 changes: 1 addition & 4 deletions docs/usecases/usecase1.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,2 @@
# Create a sequence network
# Usecases

Download and execute the [Notebook](https://github.com/PyEED/pyeed/blob/main/examples/basics.ipynb) open the notebook either in your Python environment, or follow the [instructions](../installation/jupyterlab.md) to set up the `PyEED-Lab` container as your computing environment including `pyeed`.

When using the `PyEED-Lab` Docker container, navigate in the file browser on the left to the folder that contains the downloaded [notebook](https://github.com/PyEED/pyeed/blob/main/examples/basics.ipynb). Double-click on the notebook to open it in the JupyterLab environment. Then you can execute each cell by pressing ++shift+enter++ or by clicking the `Run` button in the toolbar.
65 changes: 37 additions & 28 deletions examples/alignment.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit 5a43f73

Please sign in to comment.