Skip to content

DomFico/AlphaFold_Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AlphaFold Distance Calculator

Overview

The AlphaFold Distance Calculator is a Python script designed to process AlphaFold prediction directories. It reads atom specifications from an input file and calculates distances between specified atoms across multiple .cif files generated by AlphaFold. The script identifies the structures with the shortest distances and copies those .cif files into a designated directory for further analysis.

Features

  • Batch Processing: Recursively searches through directories to find all .cif files.
  • Flexible Atom Specification: Supports single and combined atom distance calculations using a simple specification format.
  • Detailed Output: Generates timestamped result files containing all distance calculations.
  • Lowest Distance Tracking: Identifies and copies .cif files with the lowest distances for each specification into a separate directory.
  • Error Handling: Provides informative messages for missing atoms or processing errors.

Requirements

  • Python 3.x
  • Gemmi Library: For reading and processing .cif files.
    pip install gemmi
  • NumPy: For numerical calculations.
    pip install numpy

Installation

  1. Clone the Repository
    git clone https://github.com/your-username/AlphaFold_Scraper.git
  2. Navigate to the Directory
    cd AlphaFold_Scraper
  3. Install Dependencies
    pip install -r requirements.txt

Usage

1. Prepare Atom Specifications

Create an input.txt file in the script's directory containing your atom specifications. Each line should follow the format:

CHAIN1 RES1 NUM1 ATOM1 to CHAIN2 RES2 NUM2 ATOM2
  • Multiple Specifications: To calculate combined distances, connect specifications with a + sign.
  • Comments: Use # at the beginning of a line to add comments.

Example input.txt:

# Distance between Chain A residue 100 CA and Chain B residue 200 CB
A ALA 100 CA to B GLY 200 CB

# Combined distance between two pairs of atoms
A THR 50 OG1 to A SER 75 OG + B LYS 120 NZ to B ASP 150 OD1

2. Run the Script

By default, the script processes the current directory and looks for input.txt.

python alpha_fold_distance_calculator.py

3. Customize Directory and Specification File (Optional)

Modify the main() function in the script to specify custom directories or input files:

def main():
    directory = '/path/to/alphafold/predictions'  # Specify your directory
    spec_file = 'my_input.txt'  # Specify your input file
    process_directory(directory, spec_file)

4. View Results

  • Results File: A timestamped file distance_results_YYYYMMDD_HHMMSS.txt containing the calculated distances.
  • Lowest Distances Directory: A directory lowest_distances_YYYYMMDD_HHMMSS containing .cif files with the lowest distances for each specification.

Input Format Details

  • Atom Specification Format:

    CHAIN1 RES1 NUM1 ATOM1 to CHAIN2 RES2 NUM2 ATOM2
    
    • CHAIN: Single-letter chain identifier (e.g., A, B).
    • RES: Three-letter residue name (e.g., ARG, GLY).
    • NUM: Residue sequence number (integer).
    • ATOM: Atom name (e.g., CA, CB, OG1).
  • Combined Specifications:

    Use + to connect multiple specifications for combined distance calculations.

    Spec1 + Spec2 + Spec3
    

    Example:

    A THR 30 OG1 to A SER 58 OG + B LYS 76 NZ to B ASP 100 OD1
    

Output Explanation

  • Results File Format:

    # Distance calculations performed on YYYY-MM-DD HH:MM:SS
    # Format: Structure_name | Specification | Distance(Å)
    --------------------------------------------------------------------------------
    structure1.cif | A ALA 100 CA to B GLY 200 CB | 5.67
    structure2.cif | A THR 50 OG1 to A SER 75 OG + B LYS 120 NZ to B ASP 150 OD1 | 12.34
    
  • Lowest Distances Directory:

    Contains copies of .cif files that have the lowest calculated distance for each specification, renamed for clarity.

Functions Overview

  • parse_atom_specification(spec): Parses atom specification strings into structured tuples.
  • get_atom_position(structure, chain_name, residue_name, residue_number, atom_name): Retrieves the 3D coordinates of the specified atom.
  • calculate_distance(structure, atom_specification): Calculates the Euclidean distance between two atoms.
  • calculate_combined_distance(structure, specifications): Computes the total distance for combined atom specifications.
  • find_cif_files(directory): Recursively searches for all .cif files within the specified directory.
  • process_directory(directory, specification_file): Main function to process all structures and generate results.
  • main(): Entry point of the script, where you can specify directories and files.

Error Handling

  • Missing Atoms: If an atom is not found, the script skips the calculation and informs you which atoms are missing.
  • Invalid Specifications: Provides error messages for incorrectly formatted specifications.
  • File Processing Errors: Catches and logs exceptions during file reading and processing.

Examples

Sample Atom Specification

A ARG 52 CA to B GLY 104 CB

Calculates the distance between the CA atom of ARG residue number 52 in chain A and the CB atom of GLY residue number 104 in chain B.

Running the Script with a Custom Directory

def main():
    directory = '/home/user/alphafold_predictions'
    spec_file = 'input.txt'
    process_directory(directory, spec_file)

Tips

  • Ensure that the residue numbers and chain identifiers match those in your .cif files.
  • Use the correct atom names as per the standard PDB nomenclature.
  • Review the generated results file for any skipped calculations due to missing atoms.

Contributing

Contributions are welcome! Feel free to submit a pull request or open an issue for suggestions and improvements.

License

This project is licensed under the MIT License. See the LICENSE file for details.


Disclaimer: This script is provided "as is" without warranty of any kind. Use it at your own risk.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages