Created User Manual

openmm · Oct 4, 2013 · 2e3bc65 · 2e3bc65
1 parent 9bb7859
commit 2e3bc65
Show file tree

Hide file tree

Showing 3 changed files with 197 additions and 4 deletions.
diff --git a/Manual.html b/Manual.html
@@ -0,0 +1,191 @@
+<html>
+    <head>
+        <title>PDBFixer Manual</title>
+    </head>
+<body>
+<h1 style="text-align:center">PDBFixer</h1>
+<div style="text-align:center">Copyright 2013 by Peter Eastman and Stanford University</div>
+
+<h1>1. Introduction</h1>
+
+Protein Data Bank (PDB) files often have a number of problems that must be fixed before they can be used in a molecular dynamics simulation.  The details vary depending on how the file was generated.  Here are some of the most common ones:
+
+<ol>
+    <li>If the structure was generated by X-ray crystallography, most or all of the hydrogen atoms will usually be missing.</li>
+    <li>There may also be missing heavy atoms in flexible regions that could not be clearly resolved from the electron density.  This may include anything from a few atoms at the end of a sidechain to entire loops.</li>
+    <li>Many PDB files are also missing terminal atoms that should be present at the ends of chains.</li>
+    <li>The file may include nonstandard residues that were added for crystallography purposes, but are not present in the naturally occurring molecule you want to simulate.</li>
+    <li>The file may include more than what you want to simulate.  For example, there may be salts, ligands, or other molecules that were added for experimental purposes.  Or the crystallographic unit cell may contain multiple copies of a protein, but you only want to simulate a single copy.</li>
+    <li>There may be multiple locations listed for some atoms.</li>
+    <li>If you want to simulate the structure in explicit solvent, you will need to add a water box surrounding it.</li>
+</ol>
+
+PDBFixer can fix all of these problems for you in a fully automated way.  You simply select a file, tell it which problems to fix, and it does everything else.
+<p>
+PDBFixer can be used in three different ways: as a desktop application with a graphical user interface; as a command line application; or as a Python API.  This allows you to use it in whatever way best matches your own needs for flexibility, ease of use, and scriptability.  The following sections describe how to use it in each of these ways.
+<p>
+Before running PDBFixer, you must first install <a href="https://simtk.org/home/openmm">OpenMM</a> 5.2 or later.  Follow the installation instructions in the OpenMM manual.  It is also highly recommended that you install CUDA or OpenCL.  In principle PDBFixer can use the OpenMM reference platform, but it will be prohibitively slow.  Finally, PDBFixer requires that <a href="http://www.numpy.org">NumPy</a> be installed.
+
+<h1>2. PDBFixer as a Desktop Application</h1>
+
+To run PDBFixer as a desktop application, type
+<p>
+<tt>python pdbfixer.py</tt>
+<p>
+on the command line.  PDBFixer displays its user interface through a web browser, but it is still a single user desktop application.  It should automatically launch a web browser and open a new window displaying the user interface.  If for any reason this does not happen, you can launch a web browser yourself and point it to <a href="http://localhost:8000">http://localhost:8000</a>.
+<p>
+The user interface consists of a series of pages for selecting a PDB file and choosing what changes to make to it.  Depending on the details of a particular file, some of these pages may be skipped.
+
+<h3>Load File</h3>
+
+On this page you select the PDB file to process.  You can load a file from your local disk, or enter the identifier of a PDB structure to downlaod from <a href="http://www.rcsb.org">RCSB</a>.
+
+<h3>Select Chains</h3>
+
+This is the first place where you can choose to remove parts of the structure.  It lists all chains contained in the file.  If you want to remove some of them, just uncheck them before clicking "Continue".
+
+<h3>Add Residues</h3>
+
+If the file has missing resides (based on the sequence given in the SEQRES records), this page will list them.  Select which blocks of missing residues to add, then click "Continue".
+
+<h3>Convert Residues</h3>
+
+If the file contains nonstandard residues, this page will allow you to replace them with standard ones.  It suggests a reasonable replacement for each residue, but you can choose a different one if you prefer.
+
+<h3>Add Heavy Atoms</h3>
+
+This page lists all heavy atoms that are missing from the file.  They will be added automatically.
+
+<h3>Remove Heterogens/Add Hydrogens/Add Water</h3>
+
+This page gives you the chance to make several other optional changes:
+
+<ul>
+    <li><b>Remove Heterogens</b> A heterogen is any residue other than a standard amino acid or nucleotide.  PDBFixer can automatically remove them for you.  The possible options are to keep all heterogens; keep only water while deleting all other heterogens; or delete all heterogens.</li>
+    <li><b>Add Missing Hydrogens</b> If hydrogen atoms are missing from the file, PDBFixer can add them for you.  Some residues can exist in multiple protonation states.  To select which one to use, you can specify a pH, and the most common form of each residue at that pH will be used.</li>
+    <li><b>Add Water</b> Add a water box surrounding the system.  In addition to water, counterions will be added to neutalize the system.  You also may choose to add more ions based on a desired total ionic strength.  Select the ionic strength and the types of ions to use.
+</ul>
+
+<h3>Save File</h3>
+
+You're all done!  Click "Save File" to save the processed PDB file to disk.  Then click "Process Another File" if you have more files to process, or "Quit" (in the top right corner of the page) if you are finished.
+
+<h1>3. PDBFixer as a Command Line Application</h1>
+
+PDBFixer provides a simple command line interface that is especially useful if you want to script it to process many files at once.  This interface is significantly less flexible than either the desktop interface or the Python API, but it is still powerful enough for many purposes.
+<p>
+To get usage instructions for the command line interface, type
+<p>
+<tt>python pdbfixer.py --help</tt>
+<p>
+This displays the following information:
+<tt><pre>
+Usage: pdbfixer.py
+       pdbfixer.py [options] filename
+
+When run with no arguments, it launches the user interface.  If any arguments are specified, it runs in command line mode.
+
+Options:
+  -h, --help            show this help message and exit
+  --output=FILENAME     output pdb file [default: output.pdb]
+  --add-atoms=ATOMS     which missing atoms to add: all, heavy, hydrogen, or
+                        none [default: all]
+  --keep-heterogens=OPTION
+                        which heterogens to keep: all, water, or none
+                        [default: all]
+  --replace-nonstandard
+                        replace nonstandard residues with standard equivalents
+  --add-residues        add missing residues
+  --water-box=X Y Z     add a water box. The value is the box dimensions in nm
+                        [example: --water-box=2.5 2.4 3.0]
+  --ph=PH               the pH to use for adding missing hydrogens [default:
+                        7.0]
+  --positive-ion=ION    positive ion to include in the water box: Cs+, K+,
+                        Li+, Na+, or Rb+ [default: Na+]
+  --negative-ion=ION    negative ion to include in the water box: Cl-, Br-,
+                        F-, or I- [default: Cl-]
+  --ionic-strength=STRENGTH
+                        molar concentration of ions to add to the water box
+                        [default: 0.0]
+</pre></tt>
+For example, consider the following command line:
+<p>
+<tt>python pdbfixer.py --keep-heterogens=water --replace-nonstandard --water-box=4.0 4.0 3.0 myfile.pdb</tt>
+<p>
+This will load the file "myfile.pdb".  It will add any missing atoms to existing residues, but not add any missing residues (because we did not specify <tt>--add-residues</tt>).  Hydrogens will be added that are appropriate at pH 7.0 (the default).  If the file contains any nonstandard amino acids or nucleotides, they will be replaced with the closest equivalent standard ones.  Any water molecules present in the file will be kept, but all other heterogens will be deleted.  Finally a water box of size 4 by 4 by 3 nanometers will be added surrounding the structure.  If necessary, counterions will be added to neutralize it (Na+ or Cl-), but no other ions will be added (because we accepted the default ionic strength of 0.0).  After making all these changes, the result will be written to a file called "output.pdb".
+
+<h1>4. PDBFixer as a Python API</h1>
+
+This is the most powerful way to use PDBFixer.  It allows you to script the processing of PDB files while maintaining precise programmatic control of every part of the process.
+<p>
+PDBFixer is based on OpenMM, and to use its Python API you should first be familiar with the OpenMM API.  Consult the OpenMM documentation for details.  In everything that follows, I will assume you are already familiar with OpenMM.
+<p>
+To use PDBFixer create a <tt>PDBFixer</tt> object, passing to its constructor a <tt>PdbStructure</tt> object containing the structure to process.  You then call a series of methods on it to perform various transformations.  When all the transformations are done, you can get the new structure from its <tt>topology</tt> and <tt>positions</tt> fields.  The overall outline of your code will look something like this:
+<tt><pre>
+from pdbfixer import PDBFixer
+from simtk.openmm.app.internal.pdbstructure import PdbStructure
+fixer = PDBFixer(PdbStructure(open('myfile.pdb')))
+# ...
+# Call various methods on the PDBFixer
+# ...
+PDBFile.writeFile(fixer.topology, fixer.positions, open('output.pdb', 'w'))
+</pre></tt>
+There are many different methods you might call in the middle, depending on what you want to do.  These methods are described below.  Almost all of them are optional, but they <i>must</i> be called in <i>exactly</i> the order listed.  You may choose not to call some of the optional methods, but you may not call them out of order.
+
+<h3>Identify Missing Residues</h3>
+
+To identify missing residues, call
+<p>
+<tt>fixer.findMissingResidues()</tt>
+<p>
+This method identifies any residues in the SEQRES records for which no atom data is present.  The residues are not actually added yet: it just stores the results into the <tt>missingResidues</tt> field.  This is a dictionary.  Each key is a tuple consisting of the index of a chain, and the residue index within that chain at which new residues should be inserted.  The corresponding value is a list of the names of residues to insert there.  After calling this method, you can modify the content of that dictionary to specify what residues should actually be inserted.  For example, you can remove an entry to prevent that set of residues from being inserted, or change the identities of the residues to insert.  If you do not want any residues at all to be added, you can just write
+<p>
+<tt>fixer.missingResidues = {}</tt>
+
+<h3>Replace Nonstandard Residues</h3>
+
+If you want to replace nonstandard residues with their standard versions, call
+<tt><pre>
+fixer.findNonstandardResidues()
+fixer.replaceNonstandardResidues()
+</pre></tt>
+<tt>findNonstandardResidues()</tt> stores the results into the <tt>nonstandardResidues</tt> field.  This is a dictionary whose keys are <tt>Residue</tt> objects and whose values are the names of the suggested replacement residues.  Before calling <tt>replaceNonstandardResidues()</tt> you can modify the contents of this dictionary.  For example, you can remove an entry to prevent a particular residue from being replaced, or you can change what it will be replaced with.
+
+<h3>Add Missing Heavy Atoms</h3>
+
+To add missing heavy atoms, call
+<tt><pre>
+fixer.findMissingAtoms()
+fixer.addMissingAtoms()
+</pre></tt>
+<tt>findMissingAtoms()</tt> identifies all missing heavy atoms and stores them into two fields called <tt>missingAtoms</tt> and <tt>missingTerminals</tt>.  Each of these is a dictionary whose keys are <tt>Residue</tt> objects and whose values are lists of atom names.  <tt>missingAtoms</tt> contains standard atoms that should be present in any residue of that type, while <tt>missingTerminals</tt> contains missing terminal atoms that should be present at the start or end of a chain.  You are free to remove atoms from these dictionaries before continuing, if you want to prevent certain atoms from being added.
+<p>
+<tt>addMissingAtoms()</tt> is the point at which all heavy atoms get added.  This includes the ones identified by <tt>findMissingAtoms()</tt> as well as the missing residues identified by <tt>findMissingResidues()</tt>.  Also, if you used <tt>replaceNonstandardResidues()</tt> to modify any residues, that will have removed any atoms that do not belong in the replacement residue, but it will <i>not</i> have added ones that are missing from the original residue.  <tt>addMissingAtoms()</tt> is the point when those get added.
+
+<h3>Remove Heterogens</h3>
+
+If you want to remove heterogens, call
+<p>
+<tt>fixer.removeHeterogens(False)</tt>
+<p>
+The argument specifies whether to keep water molecules.  <tt>False</tt> removes all heterogens including water.  <tt>True</tt> keeps water molecules while removing all other heterogens.
+
+<h3>Add Missing Hydrogens</h3>
+
+If you want to add missing hydrogen atoms, call
+<p>
+<tt>fixer.addMissingHydrogens(7.0)</tt>
+<p>
+The argument is the pH based on which to select protonation states.
+
+<h3>Add Water</h3>
+
+If you want to add a water box, call <tt>addSolvent()</tt>.  This method has several optional arguments.  Its full definition is
+<p>
+<tt>addSolvent(self, boxSize, positiveIon='Na+', negativeIon='Cl-', ionicStrength=0*molar)</tt>
+<p>
+<tt>boxSize</tt> is a <tt>Vec3</tt> object specifying the dimensions of the water box to add.  The other options specify the types of ions to add and the desired ionic strength.  Allowed values for <tt>positiveIon</tt> are Cs+, K+, Li+, Na+, and Rb+.  Allowed values for <tt>negativeIon</tt> are Cl-, Br-, F-, and I-.  For example, the following line builds a 5 nm cube of 0.1 molar potassium chloride:
+<p>
+<tt>fixer.addSolvent(Vec3(5, 5, 5)*unit.nanometer, positiveIon='K+', ionicStrength=0.1*molar)</tt>
+</body>
+</html>
diff --git a/README.md b/README.md
@@ -6,6 +6,8 @@ PDBFixer is an easy to use application for fixing problems in Protein Data Bank
 - Add missing heavy atoms.
 - Add missing hydrogen atoms.
 - Build missing loops.
-- Convert non-standard amino acids to their standard equivalents.
+- Convert non-standard residues to their standard equivalents.
 - Select a single position for atoms with multiple alternate positions listed.
-- Delete unwanted chains from the model.
+- Delete unwanted chains from the model.
+- Delete unwanted heterogens.
+- Build a water box for explicit solvent simulations.
diff --git a/pdbfixer.py b/pdbfixer.py
@@ -386,7 +386,7 @@ def replaceNonstandardResidues(self):
     def findMissingAtoms(self):
         """Find heavy atoms that are missing from the structure.
         
-        The results are stored into two field: missingAtoms and missingTerminals.  Each of these is a dict whose keys
+        The results are stored into two fields: missingAtoms and missingTerminals.  Each of these is a dict whose keys
         are Residue objects and whose values are lists of atom names.  missingAtoms contains standard atoms that should
         be present in any residue of that type.  missingTerminals contains terminal atoms that should be present at the
         start or end of a chain.
@@ -645,7 +645,7 @@ def _findNearestDistance(self, context, topology, newAtoms):
         parser.add_option('--water-box', dest='box', type='float', nargs=3, metavar='X Y Z', help='add a water box. The value is the box dimensions in nm [example: --water-box=2.5 2.4 3.0]')
         parser.add_option('--ph', type='float', default=7.0, dest='ph', help='the pH to use for adding missing hydrogens [default: 7.0]')
         parser.add_option('--positive-ion', default='Na+', dest='positiveIon', choices=('Cs+', 'K+', 'Li+', 'Na+', 'Rb+'), metavar='ION', help='positive ion to include in the water box: Cs+, K+, Li+, Na+, or Rb+ [default: Na+]')
-        parser.add_option('--negative-ion', default='Cl-', dest='negativeIon', choices=('Cl-', 'Br-', 'F-', 'I-'), metavar='ION', help='positive ion to include in the water box: Cl-, Br-, F-, or I- [default: Cl-]')
+        parser.add_option('--negative-ion', default='Cl-', dest='negativeIon', choices=('Cl-', 'Br-', 'F-', 'I-'), metavar='ION', help='negative ion to include in the water box: Cl-, Br-, F-, or I- [default: Cl-]')
         parser.add_option('--ionic-strength', type='float', default=0.0, dest='ionic', metavar='STRENGTH', help='molar concentration of ions to add to the water box [default: 0.0]')
         (options, args) = parser.parse_args()
         if len(args) == 0: