forked from openmm/pdbfixer
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathManual.html
314 lines (250 loc) · 21.3 KB
/
Manual.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
<html>
<head>
<title>PDBFixer Manual</title>
</head>
<body>
<h1 style="text-align:center">PDBFixer</h1>
<div style="text-align:center">Copyright 2013-2023 by Peter Eastman and Stanford University</div>
<h1>1. Introduction</h1>
Protein Data Bank (PDB or PDBx/mmCIF) files often have a number of problems that must be fixed before they can be used in a molecular dynamics simulation. The details vary depending on how the file was generated. Here are some of the most common ones:
<ol>
<li>If the structure was generated by X-ray crystallography, most or all of the hydrogen atoms will usually be missing.</li>
<li>There may also be missing heavy atoms in flexible regions that could not be clearly resolved from the electron density. This may include anything from a few atoms at the end of a sidechain to entire loops.</li>
<li>Many PDB files are also missing terminal atoms that should be present at the ends of chains.</li>
<li>The file may include nonstandard residues that were added for crystallography purposes, but are not present in the naturally occurring molecule you want to simulate.</li>
<li>The file may include more than what you want to simulate. For example, there may be salts, ligands, or other molecules that were added for experimental purposes. Or the crystallographic unit cell may contain multiple copies of a protein, but you only want to simulate a single copy.</li>
<li>There may be multiple locations listed for some atoms.</li>
<li>If you want to simulate the structure in explicit solvent, you will need to add a water box surrounding it.</li>
<li>For membrane proteins, you may also need to add a lipid membrane.</li>
</ol>
PDBFixer can fix all of these problems for you in a fully automated way. You simply select a file, tell it which problems to fix, and it does everything else.
<p>
PDBFixer can be used in three different ways: as a desktop application with a graphical user interface; as a command line application; or as a Python API. This allows you to use it in whatever way best matches your own needs for flexibility, ease of use, and scriptability. The following sections describe how to use it in each of these ways.
<h1>2. Installation</h2>
The easiest way to install PDBFixer is with conda or mamba.
<p>
<tt>conda install -c conda-forge pdbfixer</tt>
</p>
Alternatively you can install it from source. Download the repository, navigate to the root directory, and type
<p>
<tt>python setup.py install</tt>
<p>
This will install the PDBFixer python package as well as the command line program <tt>pdbfixer</tt>.
<p>
Before running PDBFixer, you must first install <a href="https://openmm.org">OpenMM</a>. Follow the installation instructions in the OpenMM manual. It is also recommended that you install CUDA or OpenCL, since the performance will usually be faster than when running on the CPU platform. PDBFixer requires that <a href="http://www.numpy.org">NumPy</a> be installed. All of these dependencies are included automatically when installing with conda.
<h1>3. PDBFixer as a Desktop Application</h1>
To run PDBFixer as a desktop application, type
<p>
<tt>pdbfixer</tt>
<p>
on the command line. PDBFixer displays its user interface through a web browser, but it is still a single user desktop application. It should automatically launch a web browser and open a new window displaying the user interface. If for any reason this does not happen, you can launch a web browser yourself and point it to <a href="http://localhost:8000">http://localhost:8000</a>.
<p>
The user interface consists of a series of pages for selecting a PDB or PDBx/mmCIF file and choosing what changes to make to it. Depending on the details of a particular file, some of these pages may be skipped.
<h3>Load File</h3>
On this page you select the PDB file to process. You can load a file from your local disk, or enter the identifier of a PDB structure to download from <a href="http://www.rcsb.org">RCSB</a>.
<h3>Select Chains</h3>
This is the first place where you can choose to remove parts of the structure. It lists all chains contained in the file. If you want to remove some of them, just uncheck them before clicking "Continue".
<p>
This page also gives options for removing heterogens. A heterogen is any residue other than a standard amino acid or nucleotide. The possible options are to keep all heterogens; keep only water while deleting all other heterogens; or delete all heterogens.
<h3>Add Residues</h3>
If the file has missing resides (based on the sequence given in the SEQRES records), this page will list them. Select which blocks of missing residues to add, then click "Continue".
<h3>Convert Residues</h3>
If the file contains nonstandard residues, this page will allow you to replace them with standard ones. It suggests a reasonable replacement for each residue, but you can choose a different one if you prefer.
<h3>Add Heavy Atoms</h3>
This page lists all heavy atoms that are missing from the file. They will be added automatically.
<h3>Add Hydrogens, Water, and Membrane</h3>
This page gives you the chance to make other optional changes:
<ul>
<li><b>Add Missing Hydrogens</b> If hydrogen atoms are missing from the file, PDBFixer can add them for you. Some residues can exist in multiple protonation states. To select which one to use, you can specify a pH, and the most common form of each residue at that pH will be used.</li>
<li><b>Add Water</b> Add a water box surrounding the system. In addition to water, counterions will be added to neutalize the system. You also may choose to add more ions based on a desired total ionic strength. Select the ionic strength and the types of ions to use.</li>
<li><b>Add Membrane</b> Add a lipid membrane. POPC and POPE lipids are supported.</li>
</ul>
<h3>Save File</h3>
You're all done! Click "Save File" to save the processed PDB file to disk. Then click "Process Another File" if you have more files to process, or "Quit" (in the top right corner of the page) if you are finished.
<h1>4. PDBFixer as a Command Line Application</h1>
PDBFixer provides a simple command line interface that is especially useful if you want to script it to process many files at once. This interface is significantly less flexible than either the desktop interface or the Python API, but it is still powerful enough for many purposes.
<p>
To get usage instructions for the command line interface, type
<p>
<tt>pdbfixer --help</tt>
<p>
This displays the following information:
<tt><pre>
Usage: pdbfixer
pdbfixer filename [options]
When run with no arguments, it launches the user interface. If any arguments are specified, it runs in command line mode.
Options:
-h, --help show this help message and exit
--pdbid=PDBID PDB id to retrieve from RCSB [default: None]
--url=URL URL to retrieve PDB from [default: None]
--output=FILENAME output pdb file [default: output.pdb]
--add-atoms=ATOMS which missing atoms to add: all, heavy, hydrogen, or
none [default: all]
--keep-heterogens=OPTION
which heterogens to keep: all, water, or none
[default: all]
--replace-nonstandard
replace nonstandard residues with standard equivalents
--add-residues add missing residues
--water-box=X Y Z add a water box. The value is the box dimensions in nm
[example: --water-box=2.5 2.4 3.0]
--ph=PH the pH to use for adding missing hydrogens [default:
7.0]
--positive-ion=ION positive ion to include in the water box: Cs+, K+,
Li+, Na+, or Rb+ [default: Na+]
--negative-ion=ION negative ion to include in the water box: Cl-, Br-,
F-, or I- [default: Cl-]
--ionic-strength=STRENGTH
molar concentration of ions to add to the water box
[default: 0.0]
--verbose Print verbose output
</pre></tt>
For example, consider the following command line:
<p>
<tt>pdbfixer --keep-heterogens=water --replace-nonstandard --water-box=4.0 4.0 3.0 myfile.pdb</tt>
<p>
This will load the file "myfile.pdb". It will add any missing atoms to existing residues, but not add any missing residues (because we did not specify <tt>--add-residues</tt>). Hydrogens will be added that are appropriate at pH 7.0 (the default). If the file contains any nonstandard amino acids or nucleotides, they will be replaced with the closest equivalent standard ones. Any water molecules present in the file will be kept, but all other heterogens will be deleted. Finally a water box of size 4 by 4 by 3 nanometers will be added surrounding the structure. If necessary, counterions will be added to neutralize it (Na+ or Cl-), but no other ions will be added (because we accepted the default ionic strength of 0.0). After making all these changes, the result will be written to a file called "output.pdb".
<h1>5. PDBFixer as a Python API</h1>
This is the most powerful way to use PDBFixer. It allows you to script the processing of PDB files while maintaining precise programmatic control of every part of the process.
<p>
PDBFixer is based on OpenMM, and to use its Python API you should first be familiar with the OpenMM API. Consult the OpenMM documentation for details. In everything that follows, I will assume you are already familiar with OpenMM.
<p>
To use PDBFixer create a <tt>PDBFixer</tt> object, passing to its constructor the file to process. You then call a series of methods on it to perform various transformations. When all the transformations are done, you can get the new structure from its <tt>topology</tt> and <tt>positions</tt> fields. The overall outline of your code will look something like this:
<tt><pre>
fixer = PDBFixer(filename='myfile.pdb')
# ...
# Call various methods on the PDBFixer
# ...
PDBFile.writeFile(fixer.topology, fixer.positions, open('output.pdb', 'w'))
</pre></tt>
There are many different methods you might call in the middle, depending on what you want to do. These methods are described below. Almost all of them are optional, but they <i>must</i> be called in <i>exactly</i> the order listed. You may choose not to call some of the optional methods, but you may not call them out of order.
<h3>Remove Chains</h3>
To remove unwanted chains from the structure, call
<p>
<tt>fixer.removeChains(indices)</tt>
<p>
where <tt>indices</tt> is a list of the indices of the chains to remove.
<h3>Identify Missing Residues</h3>
To identify missing residues, call
<p>
<tt>fixer.findMissingResidues()</tt>
<p>
This method identifies any residues in the SEQRES records for which no atom data is present. The residues are not actually added yet: it just stores the results into the <tt>missingResidues</tt> field. This is a dictionary. Each key is a tuple consisting of the index of a chain, and the residue index within that chain at which new residues should be inserted. The corresponding value is a list of the names of residues to insert there. After calling this method, you can modify the content of that dictionary to specify what residues should actually be inserted. For example, you can remove an entry to prevent that set of residues from being inserted, or change the identities of the residues to insert. If you do not want any residues at all to be added, you can just write
<p>
<tt>fixer.missingResidues = {}</tt>
<p>
The residues actually get added when you call <tt>addMissingAtoms()</tt>, described below.
<h3>Replace Nonstandard Residues</h3>
If you want to replace nonstandard residues with their standard versions, call
<tt><pre>
fixer.findNonstandardResidues()
fixer.replaceNonstandardResidues()
</pre></tt>
<tt>findNonstandardResidues()</tt> stores the results into the <tt>nonstandardResidues</tt> field, which is a list. Each entry is a tuple whose first element is a <tt>Residue</tt> object and whose second element is the name of the suggested replacement residue. Before calling <tt>replaceNonstandardResidues()</tt> you can modify the contents of this list. For example, you can remove an entry to prevent a particular residue from being replaced, or you can change what it will be replaced with. You can even add new entries if you want to mutate other residues.
<h3>Remove Heterogens</h3>
If you want to remove heterogens, call
<p>
<tt>fixer.removeHeterogens(False)</tt>
<p>
The argument specifies whether to keep water molecules. <tt>False</tt> removes all heterogens including water. <tt>True</tt> keeps water molecules while removing all other heterogens.
<h3>Add Missing Heavy Atoms</h3>
To add missing heavy atoms, call
<tt><pre>
fixer.findMissingAtoms()
fixer.addMissingAtoms()
</pre></tt>
<tt>findMissingAtoms()</tt> identifies all missing heavy atoms and stores them into two fields called <tt>missingAtoms</tt> and <tt>missingTerminals</tt>. Each of these is a dictionary whose keys are <tt>Residue</tt> objects and whose values are lists of atom names. <tt>missingAtoms</tt> contains standard atoms that should be present in any residue of that type, while <tt>missingTerminals</tt> contains missing terminal atoms that should be present at the start or end of a chain. You are free to remove atoms from these dictionaries before continuing, if you want to prevent certain atoms from being added.
<p>
<tt>addMissingAtoms()</tt> is the point at which all heavy atoms get added. This includes the ones identified by <tt>findMissingAtoms()</tt> as well as the missing residues identified by <tt>findMissingResidues()</tt>. Also, if you used <tt>replaceNonstandardResidues()</tt> to modify any residues, that will have removed any atoms that do not belong in the replacement residue, but it will <i>not</i> have added ones that are missing from the original residue. <tt>addMissingAtoms()</tt> is the point when those get added.
<h3>Add Missing Hydrogens</h3>
If you want to add missing hydrogen atoms, call
<p>
<tt>fixer.addMissingHydrogens(7.0)</tt>
<p>
The argument is the pH based on which to select protonation states.
<h3>Add Water</h3>
If you want to add a water box, call <tt>addSolvent()</tt>. This method has several optional arguments. Its full definition is
<p>
<tt>addSolvent(self, boxSize, positiveIon='Na+', negativeIon='Cl-', ionicStrength=0*molar)</tt>
<p>
<tt>boxSize</tt> is a <tt>Vec3</tt> object specifying the dimensions of the water box to add. The other options specify the types of ions to add and the desired ionic strength. Allowed values for <tt>positiveIon</tt> are Cs+, K+, Li+, Na+, and Rb+. Allowed values for <tt>negativeIon</tt> are Cl-, Br-, F-, and I-. For example, the following line builds a 5 nm cube of 0.1 molar potassium chloride:
<p>
<tt>fixer.addSolvent(Vec3(5, 5, 5)*nanometer, positiveIon='K+', ionicStrength=0.1*molar)</tt>
<h3>Add Membrane</h3>
If you want to add a lipid membrane, call <tt>addMembrane()</tt>. This method also adds water, so you should not also call <tt>addSolvent()</tt> when using it. There are several optional arguments. The full definition is
<p>
<tt>addMembrane(self, lipidType='POPC', membraneCenterZ=0*nanometer, minimumPadding=1*nanometer, positiveIon='Na+', negativeIon='Cl-', ionicStrength=0*molar)</tt>
<p>
<tt>lipidType</tt> should be either <tt>'POPC'</tt> or <tt>'POPE'</tt>. <tt>membraneCenterZ</tt> is the position along the Z axis at which the center of the membrane will be located. <tt>minimumPadding</tt> is the minimum acceptable distance between the protein and the edges of the periodic box. All other arguments are the same as for <tt>addSolvent()</tt>. For example, the following line adds a POPE membrane centered at Z=0 with at least 1 nm of padding on all sides:
<p>
<tt>fixer.addMembrane('POPE')</tt>
<h3>Examples</h3>
Here is a complete example that ties this together. It adds all missing atoms including heavy atoms, missing residues, and hydrogens. It replaces all nonstandard residues by their standard equivalents then deletes all remaining heterogens except water. Finally, it fills in all gaps with water; that is, it adds a water box whose dimensions match the crystallographic unit cell. It saves the result to a file called output.pdb.
<tt><pre>
from pdbfixer import PDBFixer
from openmm.app import PDBFile
fixer = PDBFixer(filename='myfile.pdb')
fixer.findMissingResidues()
fixer.findNonstandardResidues()
fixer.replaceNonstandardResidues()
fixer.removeHeterogens(True)
fixer.findMissingAtoms()
fixer.addMissingAtoms()
fixer.addMissingHydrogens(7.0)
fixer.addSolvent(fixer.topology.getUnitCellDimensions())
PDBFile.writeFile(fixer.topology, fixer.positions, open('output.pdb', 'w'))
</pre></tt>
Suppose you want to keep only the first chain in the file and remove all the others. To do this, call <tt>removeChains()</tt>:
<tt><pre>
<span style="color:grey">fixer = PDBFixer(filename='myfile.pdb')</span>
numChains = len(list(fixer.topology.chains()))
fixer.removeChains(range(1, numChains))
<span style="color:grey">fixer.findMissingResidues()</span>
</pre></tt>
Suppose you only want to add missing residues in the <i>middle</i> of a chain, not ones at the start or end of the chain. This can be done by modifying the <tt>missingResidues</tt> field immediately after calling <tt>findMissingResidues()</tt>:
<tt><pre>
<span style="color:grey">fixer.findMissingResidues()</span>
chains = list(fixer.topology.chains())
keys = fixer.missingResidues.keys()
for key in keys:
chain = chains[key[0]]
if key[1] == 0 or key[1] == len(list(chain.residues())):
del fixer.missingResidues[key]
<span style="color:grey">fixer.findNonstandardResidues()</span>
</pre></tt>
Suppose that instead of matching the size of the crystallographic unit cell, you want to add a cubic water box large enough to hold the entire system. That is, you want to compute the range of atom positions along each axis, then set the size of the water box to match the largest axis. This can be done as follows:
<tt><pre>
<span style="color:grey">fixer.addMissingHydrogens(7.0)</span>
maxSize = max(max((pos[i] for pos in fixer.positions))-min((pos[i] for pos in fixer.positions)) for i in range(3))
boxSize = maxSize*Vec3(1, 1, 1)
fixer.addSolvent(boxSize)
<span style="color:grey">PDBFile.writeFile(fixer.topology, fixer.positions, open('output.pdb', 'w'))</span>
</pre></tt>
Suppose you want to replace all nonstandard residues by alanine. <tt>findNonstandardResidues()</tt> selects a suggested replacement for each residue, but you can modify that before calling <tt>replaceNonstandardResidues()</tt>:
<tt><pre>
<span style="color:grey">fixer.findNonstandardResidues()</span>
fixer.nonstandardResidues = [(residue, 'ALA') for residue, replacement in fixer.nonstandardResidues]
<span style="color:grey">fixer.replaceNonstandardResidues()</span>
</pre></tt>
<h3>Nonstandard Residues</h3>
PDBFixer knows how to build all the standard amino acids and nucleotides. If you want to apply it to other residues, you need to tell it how. This is done by defining templates.
A template is a description of a residue: what atoms it contains, how they are bonded to each other, and what conformation they take on. For residues defined in the PDB Chemical Component Dictionary, you can simply tell PDBFixer to download the definition of whatever residue you need. For example, if you want to model a protein containing a selenocysteine (PDB code SEC), call
<tt><pre>
fixer.downloadTemplate('SEC')
</pre></tt>
Now you can call other methods such as <tt>findMissingAtoms()</tt>, <tt>addMissingAtoms()</tt>, and <tt>addMissingHydrogens()</tt> in the usual way. PDBFixer will fix problems in the SEC residue exactly as it does for standard ones.
<p/>
If your structure contains molecules or residues that are not present in the PDB Chemical Component Dictionary, you can still add templates for them, but you need to define them yourself. This is done by calling <tt>registerTemplate()</tt>. It takes two required arguments: an OpenMM Topology object describing the structure of the residue, and a list of positions describing a typical conformation of the residue. These can be easily obtained from, for example, a PDB or PDBx/mmCIF file containing just the single residue.
<tt><pre>
pdbx = PDBxFile('MOL.cif')
fixer.registerTemplate(pdbx.topology, pdbx.positions)
</pre></tt>
For residues that appear as part of larger molecules rather than in isolation, you need to provide one more piece of information: which atoms should only be present in terminal residues. For example, a C-terminal amino acid is capped with OXT and HXT atoms, but these atoms are omitted when the residue appears in the middle of a chain and is bonded to another residue. Likewise, the nitrogen in an N-terminal amino acid has two hydrogens attached to it (H and H2), but when it appears in the middle of a chain and the nitrogen is bonded to another residue, the H2 hydrogen is omitted. You can pass a third argument containing a list of boolean flags for each atom, specifying which ones should be omitted from non-terminal residues.
<tt><pre>
pdbx = PDBxFile('RES.cif')
terminal = [atom.name in ('H2', 'OXT', 'HXT') for atom in pdbx.topology.atoms()]
fixer.registerTemplate(pdbx.topology, pdbx.positions, terminal)
</pre></tt>
Strictly speaking, it is the bonds that matter, not the position in the chain. For example, the sulfur in a CYS residue has a hydrogen that must be omitted when it forms a disulfide bond to another residue. What matters is whether the sulfur has a bond connecting it to another residue, not the position of the CYS residue in its chain.
</body>
</html>