-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge feature-branch into main, accepting changes from feature-branch
- Loading branch information
Showing
22 changed files
with
595 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Align sequences" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"application/vnd.jupyter.widget-view+json": { | ||
"model_id": "8b73afa8e8b444578f622d239c439673", | ||
"version_major": 2, | ||
"version_minor": 0 | ||
}, | ||
"text/plain": [ | ||
"Output()" | ||
] | ||
}, | ||
"metadata": {}, | ||
"output_type": "display_data" | ||
}, | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n" | ||
], | ||
"text/plain": [] | ||
}, | ||
"metadata": {}, | ||
"output_type": "display_data" | ||
}, | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n", | ||
"</pre>\n" | ||
], | ||
"text/plain": [ | ||
"\n" | ||
] | ||
}, | ||
"metadata": {}, | ||
"output_type": "display_data" | ||
} | ||
], | ||
"source": [ | ||
"import json\n", | ||
"from pyeed.core import ProteinRecord\n", | ||
"\n", | ||
"\n", | ||
"# load accession ids from json file\n", | ||
"with open(\"ids.json\", \"r\") as f:\n", | ||
" ids = json.load(f)\n", | ||
"\n", | ||
"sequences = ProteinRecord.get_ids(ids)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Multi Sequence Alignment\n", | ||
"\n", | ||
"A multi sequence alignment can be calculated by creating a `MSA` object and passing a list of `ProteinRecord`. The alignment can be executed by calling the `clustalo` method. In order for the `clustalo` method to work, the PyEED Docker Service must be running. The `clustalo` method will return an `AlignmentResult` containing all input `sequences` and `aligned_sequences`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"application/vnd.jupyter.widget-view+json": { | ||
"model_id": "99f840a5cc0a4441a7d936127815ab36", | ||
"version_major": 2, | ||
"version_minor": 0 | ||
}, | ||
"text/plain": [ | ||
"Output()" | ||
] | ||
}, | ||
"metadata": {}, | ||
"output_type": "display_data" | ||
}, | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n" | ||
], | ||
"text/plain": [] | ||
}, | ||
"metadata": {}, | ||
"output_type": "display_data" | ||
}, | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"✅ Alignment completed\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from pyeed.align import MSA\n", | ||
"\n", | ||
"alignment = MSA(sequences=sequences).clustalo()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Create a HMM profile\n", | ||
"\n", | ||
"To create a hidden markov model profile, you can use the `HMM` class. This method receives a `MSA` object to create the model. To check if a sequence belongs to the profile, you can use the `search` method. This method takes a `ProteinRecord` object and returns a `HMMResult` object containing the `sequence` and the `score` of the sequence in the profile." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from pyeed.align import HMM\n", | ||
"\n", | ||
"model = HMM(name=\"random profile\", alignment=alignment)\n", | ||
"hits = model.search(sequence=sequences[0])" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "pye", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Get rich sequence information\n", | ||
"\n", | ||
"## Acquire sequence information based on accession id(s)\n", | ||
"\n", | ||
"**Single accession ID**\n", | ||
"\n", | ||
"Single sequences can be retrieved using the `get_id` function. The function takes an accession id as input and returns the sequence as a `ProteinRecord` object. \n", | ||
"The `ProteinRecord` object contains the sequence as a string and additional information such as information on the `Organism`, `Region` or `Site` annotations of the sequence.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from pyeed.core import ProteinRecord\n", | ||
"\n", | ||
"matHM = ProteinRecord.get_id(\"MBP1912539.1\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"**Multiple accession IDs**\n", | ||
"\n", | ||
"To load multiple sequences at once, the `get_ids` function can be used. The function takes a list of accession IDs as input and returns a list of `ProteinRecord` objects." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import json\n", | ||
"\n", | ||
"# Load the saved ids from json\n", | ||
"with open(\"ids.json\", \"r\") as f:\n", | ||
" ids = json.load(f)\n", | ||
"\n", | ||
"# Get the protein info for each id\n", | ||
"proteins = ProteinRecord.get_ids(ids)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Serach for similar sequences with BLAST\n", | ||
"\n", | ||
"The `ncbi_blast` method can be used to perform a BLAST search on the NCBI server. The method can be applied to a `ProteinRecord` object and returns a list of `ProteinRecord` objects that represent the hits of the BLAST search.\n", | ||
"By specifying the `n_hits`, `e_value`, `db`, `matrix`, and `identity`, the search can be customized to number of hits, E-value, query database, substitution matrix, and identity to accept the hit, respectively.\n", | ||
"\n", | ||
"<div class=\"admonition warning\">\n", | ||
" <p class=\"admonition-title\">NCBI BLAST service might be slow</p>\n", | ||
" <p>Due to the way NCBI handles requests to its BLAST API the service is quite slow. During peak working hours a single search might take more than 15 min.</p>\n", | ||
"</div>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"blast_results = matHM.ncbi_blast(\n", | ||
" n_hits=100,\n", | ||
" e_value=0.05,\n", | ||
" db=\"swissprot\",\n", | ||
" matrix=\"BLOSUM62\",\n", | ||
" identity=0.5,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Inspect objects\n", | ||
"\n", | ||
"Each `pyeed` object has a rich `print` method, displaying all the information available for the object. This can be useful to inspect the object and its attributes." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"\u001b[4mProteinRecord\u001b[0m\n", | ||
"├── \u001b[94mid\u001b[0m = MBP1912539.1\n", | ||
"├── \u001b[94mname\u001b[0m = S-adenosylmethionine synthetase\n", | ||
"├── \u001b[94morganism\u001b[0m\n", | ||
"│ └── \u001b[4mOrganism\u001b[0m\n", | ||
"│ ├── \u001b[94mid\u001b[0m = ec01bd4b-490f-4908-aa3c-f8435295e9ef\n", | ||
"│ ├── \u001b[94mtaxonomy_id\u001b[0m = 49900\n", | ||
"│ ├── \u001b[94mname\u001b[0m = Thermococcus stetteri\n", | ||
"│ ├── \u001b[94mdomain\u001b[0m = Archaea\n", | ||
"│ ├── \u001b[94mphylum\u001b[0m = Euryarchaeota\n", | ||
"│ ├── \u001b[94mtax_class\u001b[0m = Thermococci\n", | ||
"│ ├── \u001b[94morder\u001b[0m = Thermococcales\n", | ||
"│ ├── \u001b[94mfamily\u001b[0m = Thermococcaceae\n", | ||
"│ └── \u001b[94mgenus\u001b[0m = Thermococcus\n", | ||
"├── \u001b[94msequence\u001b[0m = MLMAEKIRNIVVEEMVRTPVEMQQVELVERKGIGHPDSIADGIAEAVSRALSREYMKRYGIILHHNTDQVEVVGGRAYPQFGGGEVIKPIYILLSGRAVEMVDREFFPVHEVAIKAAKDYLKKAVRHLDIENHVVIDSRIGQGSVDLVGVFNKAKKNPIPLANDTSFGVGYAPLSETERIVLETEKYLNSDEFKKKWPAVGEDIKVMGLRKGDEIDLTIAAAIVDSEVDNPDDYMAVKEAIYEAAKEIVESHTQRPTNIYVNTADDPKEGIYYITVTGTSAEAGDDGSVGRGNRVNGLITPNRHMSMEAAAGKNPVSHVGKIYNILSMLIANDIAEQIEGVEEVYVRILSQIGKPIDEPLVASVQIIPKKGYSIDVLQKPAYEIADEWLANITKIQKMILEDKINVF\n", | ||
"├── \u001b[94mcoding_sequence\u001b[0m\n", | ||
"│ └── 0\n", | ||
"│ └── \u001b[4mRegion\u001b[0m\n", | ||
"│ ├── \u001b[94mid\u001b[0m = JAGGKB010000004.1\n", | ||
"│ ├── \u001b[94mstart\u001b[0m = 39572\n", | ||
"│ └── \u001b[94mend\u001b[0m = 40795\n", | ||
"└── \u001b[94mec_number\u001b[0m = 2.5.1.6\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(matHM)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
[ | ||
"A1RSD7", | ||
"A8MD44", | ||
"P0CW62", | ||
"A6VHQ4", | ||
"A9A923", | ||
"B6YUL1", | ||
"Q5V2S5", | ||
"Q4JAL1", | ||
"B1YC36", | ||
"P0CW63", | ||
"Q980S9", | ||
"Q3IQF5", | ||
"Q9V1P7", | ||
"Q8PWS4", | ||
"Q5JF22", | ||
"Q8TU57", | ||
"C5A4B7", | ||
"B0R5A8", | ||
"P26498", | ||
"O67275", | ||
"A7I771", | ||
"Q976F3", | ||
"A3MY01", | ||
"Q58605" | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.