The AlphaFold Distance Calculator is a Python script designed to process AlphaFold prediction directories. It reads atom specifications from an input file and calculates distances between specified atoms across multiple .cif
files generated by AlphaFold. The script identifies the structures with the shortest distances and copies those .cif
files into a designated directory for further analysis.
- Batch Processing: Recursively searches through directories to find all
.cif
files. - Flexible Atom Specification: Supports single and combined atom distance calculations using a simple specification format.
- Detailed Output: Generates timestamped result files containing all distance calculations.
- Lowest Distance Tracking: Identifies and copies
.cif
files with the lowest distances for each specification into a separate directory. - Error Handling: Provides informative messages for missing atoms or processing errors.
- Python 3.x
- Gemmi Library: For reading and processing
.cif
files.pip install gemmi
- NumPy: For numerical calculations.
pip install numpy
- Clone the Repository
git clone https://github.com/your-username/AlphaFold_Scraper.git
- Navigate to the Directory
cd AlphaFold_Scraper
- Install Dependencies
pip install -r requirements.txt
Create an input.txt
file in the script's directory containing your atom specifications. Each line should follow the format:
CHAIN1 RES1 NUM1 ATOM1 to CHAIN2 RES2 NUM2 ATOM2
- Multiple Specifications: To calculate combined distances, connect specifications with a
+
sign. - Comments: Use
#
at the beginning of a line to add comments.
Example input.txt
:
# Distance between Chain A residue 100 CA and Chain B residue 200 CB
A ALA 100 CA to B GLY 200 CB
# Combined distance between two pairs of atoms
A THR 50 OG1 to A SER 75 OG + B LYS 120 NZ to B ASP 150 OD1
By default, the script processes the current directory and looks for input.txt
.
python alpha_fold_distance_calculator.py
Modify the main()
function in the script to specify custom directories or input files:
def main():
directory = '/path/to/alphafold/predictions' # Specify your directory
spec_file = 'my_input.txt' # Specify your input file
process_directory(directory, spec_file)
- Results File: A timestamped file
distance_results_YYYYMMDD_HHMMSS.txt
containing the calculated distances. - Lowest Distances Directory: A directory
lowest_distances_YYYYMMDD_HHMMSS
containing.cif
files with the lowest distances for each specification.
-
Atom Specification Format:
CHAIN1 RES1 NUM1 ATOM1 to CHAIN2 RES2 NUM2 ATOM2
CHAIN
: Single-letter chain identifier (e.g.,A
,B
).RES
: Three-letter residue name (e.g.,ARG
,GLY
).NUM
: Residue sequence number (integer).ATOM
: Atom name (e.g.,CA
,CB
,OG1
).
-
Combined Specifications:
Use
+
to connect multiple specifications for combined distance calculations.Spec1 + Spec2 + Spec3
Example:
A THR 30 OG1 to A SER 58 OG + B LYS 76 NZ to B ASP 100 OD1
-
Results File Format:
# Distance calculations performed on YYYY-MM-DD HH:MM:SS # Format: Structure_name | Specification | Distance(Å) -------------------------------------------------------------------------------- structure1.cif | A ALA 100 CA to B GLY 200 CB | 5.67 structure2.cif | A THR 50 OG1 to A SER 75 OG + B LYS 120 NZ to B ASP 150 OD1 | 12.34
-
Lowest Distances Directory:
Contains copies of
.cif
files that have the lowest calculated distance for each specification, renamed for clarity.
parse_atom_specification(spec)
: Parses atom specification strings into structured tuples.get_atom_position(structure, chain_name, residue_name, residue_number, atom_name)
: Retrieves the 3D coordinates of the specified atom.calculate_distance(structure, atom_specification)
: Calculates the Euclidean distance between two atoms.calculate_combined_distance(structure, specifications)
: Computes the total distance for combined atom specifications.find_cif_files(directory)
: Recursively searches for all.cif
files within the specified directory.process_directory(directory, specification_file)
: Main function to process all structures and generate results.main()
: Entry point of the script, where you can specify directories and files.
- Missing Atoms: If an atom is not found, the script skips the calculation and informs you which atoms are missing.
- Invalid Specifications: Provides error messages for incorrectly formatted specifications.
- File Processing Errors: Catches and logs exceptions during file reading and processing.
A ARG 52 CA to B GLY 104 CB
Calculates the distance between the CA
atom of ARG
residue number 52
in chain A
and the CB
atom of GLY
residue number 104
in chain B
.
def main():
directory = '/home/user/alphafold_predictions'
spec_file = 'input.txt'
process_directory(directory, spec_file)
- Ensure that the residue numbers and chain identifiers match those in your
.cif
files. - Use the correct atom names as per the standard PDB nomenclature.
- Review the generated results file for any skipped calculations due to missing atoms.
Contributions are welcome! Feel free to submit a pull request or open an issue for suggestions and improvements.
This project is licensed under the MIT License. See the LICENSE file for details.
Disclaimer: This script is provided "as is" without warranty of any kind. Use it at your own risk.