GOLD. The main GOLD binaries are in files gold_ARCH, where ARCH is R4000, R5000, R8000 or R10000, for SGI R4000, R5000, R8000 and R10000 computers. The gold_ARCH command takes a configuration file as an argument. This is a text file which specifies the calculation that is to be run, including details of the ligand, the protein binding site and the GA parameters. Although the file can be generated with a standard text editor, the easiest way to create it is to use the front end. The configuration file can have any name, but by convention is normally called gold.conf. Once a configuration file has been created (e.g. as gold.conf), GOLD can be run in the background - for example, on an SGI R4000 - with the command:
% gold_R4000 gold.conf &
Solutions can be ranked using fit_analy.
Handling Interrupts
If GOLD receives an interrupt (signal 2), the program will exit cleanly: the current top solution will be output in file gold_soln_ligand.mol2 and the PID file (see below) will be removed.
Sending GOLD the signal USR1 (using the command kill -USR1 <GOLD process id>) will result in GOLD outputting the current top solution to the file gold_soln_current_ligand.mol2, then continuing execution.
The front end is a Tcl/Tk script which provides the user with an easy-to-use graphical interface to the program. The script is contained in the file gold_front. It is normally invoked by the command gold, which is a simple shell script that determines the computer architecture and calls gold_front.
The progress of a GOLD batch job is recorded in the file gold.log. This file contains information about the initialisation of the protein and ligand, the progress of each docking run and the solutions generated by the software.
The file gold.log is line buffered, so you can see how the algorithm is progressing even when GOLD is run in the background. The output window in the front end is simply the current contents of gold.log
Initially, the log file details the parameterisation. The protein is then loaded and the binding site determined. The (first) ligand is then loaded and the GA runs commence. The progress of each GA run is listed:
** Docking run number 1 **
Rank fitness base is 5000.00 increment is 11.2233
Initialising population(s) ...
Pop: 1 size: 100 #dups: 0 #niche: 0 #errs: 1
Pop: 2 size: 100 #dups: 0 #niche: 1 #errs: 2
Pop: 3 size: 100 #dups: 0 #niche: 0
Pop: 4 size: 100 #dups: 0 #niche: 1
Pop: 1 size: 100 #dups: 0 #niche: 0 #errs: 1
Pop: 2 size: 100 #dups: 0 #niche: 1 #errs: 2
Pop: 3 size: 100 #dups: 0 #niche: 0
Pop: 4 size: 100 #dups: 0 #niche: 1
Pop: 5 size: 100 #dups: 0 #niche: 0
Doing GA no population(s) 5 size 100 selection pressure 1.100000
VDW energies ligand h-bonds
Pop Opcnt Fitness Internal External Internal External
2 0 49.12 -18.31 38.07 0.00 0.43
5 270 50.20 -26.26 40.19 0.00 0.19
5 975 50.49 -26.26 40.56 0.00 0.08
3 1293 53.16 -15.63 41.04 0.00 0.00
2 2817 52.37 -18.77 40.95 0.00 0.25
1 2871 52.68 -15.58 40.16 0.00 0.93
5 3190 55.02 -18.08 42.74 0.00 0.37
4 3719 56.23 -16.85 43.54 0.00 0.28
2 4017 56.23 -18.31 43.72 0.00 0.46
4 4454 57.48 -16.85 44.33 0.00 0.53
4 5279 57.09 -13.75 43.61 0.00 0.53
4 6079 56.40 -13.75 43.07 0.00 0.74
2 7887 55.87 -18.31 43.81 0.00 0.68
2 8357 58.00 -14.99 44.94 0.00 0.44
4 8419 58.10 -17.92 45.34 0.00 0.80
1 9056 58.69 -15.13 45.43 0.00 0.68
1 9981 58.84 -15.16 45.61 0.00 0.68
5 12760 58.06 -15.38 45.81 0.00 0.34
5 12770 58.11 -14.65 45.42 0.00 0.67
4 13814 58.07 -12.62 45.06 0.00 0.63
3 14653 58.01 -12.63 45.23 0.00 0.52
4 14829 58.03 -15.65 45.97 0.00 0.66
4 14989 58.03 -12.63 45.25 0.00 0.52
4 16039 57.50 -12.63 45.09 0.00 0.52
3 16288 57.51 -12.71 45.13 0.00 0.50
3 16883 57.19 -12.71 44.99 0.00 0.49
5 17495 57.36 -14.65 45.82 0.00 0.43
5 17835 57.30 -14.65 45.72 0.00 0.65
4 17979 57.34 -14.65 45.75 0.00 0.65
Reordering...
3 21000 55.54 -12.63 44.57 0.00 0.48
3 21563 55.50 -12.93 44.76 0.00 0.47
3 21903 55.56 -12.93 44.81 0.00 0.47
4 23654 55.00 -12.93 44.82 0.00 0.46
3 23913 55.01 -12.81 44.78 0.00 0.46
......................
2 48142 49.81 -12.31 44.84 0.00 0.46
5 49320 49.81 -12.31 44.84 0.00 0.46
#ops: 50000 #mutates: 23758 #cross: 23754 #migrate: 2488
#dups: 18541 #niche: 35159 #errors: 71
Ligand is in gold_soln_ligand_1.mol2
Fitness 49.81
Energies 1.68 (int VDW) -13.99 (int tor) 44.84 (ext VDW) H bond 0.00 (int) 0.46 (ext)
Ligand acceptor 3 Protein hydrogen 1229 bond energy 2.91 wt 0.14
Ligand acceptor 5
Ligand acceptor 8
Ligand acceptor 10
Ligand acceptor 22
Ligand acceptor 24 Protein hydrogen 1022 bond energy 0.40 wt 0.11
Current Ranking 1
The best ("most fit") individual at any time is listed. The total fitness and steric and hydrogen bond components are also displayed. The internal VDW energy includes a torsional component. The external VDW energy is normally scaled by a factor of 1.375 and summed with the other components to give the total fitness.
Some parameters involved in determining the fitness score are annealed as the algorithm progresses (the rationale is that bad contacts and poor hydrogen bonds are allowed at the beginning of a GA run in the hope that they will evolve into good solutions). Thus, the total fitness score of the best individual may decrease over the course of a run. The "Reordering..." message refers to a re-ranking of the GA populations caused by the process of annealing parameters.
At the end of the GA run, the solution is output and summarised.
Following all GA runs, the solutions from different runs are compared:
Final Ranking 4 2 5 1 3
_______________________________
RMSD Matrix of RANKED solutions
2 3 4 5
1 : 4.8 4.7 5.1 10.1
2 : 4.0 3.1 10.9
3 : 4.1 10.4
4 : 11.0
Clustering using complete linkage.
Structure ids are RANKING
Dist Clusters...
3.14 | 4 2 | 3 | 5 | 1 |
4.06 | 4 2 3 | 1 | 5 |
5.07 | 4 2 3 1 | 5 |
10.95 | 4 2 3 1 5 |
Finished Docking Ligand ligand.mol
In this case, solution number 4 had the largest fitness score (this solution will be in gold_soln_ligand_4.mol2, which will be symbolically linked to ranked_ligand_1.mol2), while solution number 3 had the worst fitness. The RMSD matrix shows how different the various solutions are. This matrix uses rankings rather than run numbers. For example, ranked structures 2 and 4 have an RMSD of 3.1 Angstrom: these structures are in the files ranked_ligand_2.mol2 and ranked_ligand_4.mol2 (or, equivalently, gold_soln_ligand_2.mol2 and gold_soln_ligand_1.mol2). The RMSD algorithm takes account of symmetry effects, using a graph isomorphism algorithm.
Finally, the RMSD values are used to perform a hierarchical cluster analysis, using the complete linkage algorithm. Each line shows one iteration of the clustering algorithm, the minimum distance between clusters at that step, and the resulting clusters. Clusters are separated by the '|' symbol and rankings are used rather than run numbers.
The file gold.err contains any errors found by the program. These are generally fatal and cause the program to stop. It is a good idea to check gold.err if something appears to go wrong.
Errors found by the atom-type checker are written to gold.err. If you are unsure about your atom typing you should therefore check gold.err. For example:
check_aromatic_bonds_and_atoms: WARNING atom 6 in ligand.mol is type C.2 but should be C.ar check_aromatic_bonds_and_atoms: WARNING bond 8 in ligand.mol is type 2 but should be ar check_aromatic_bonds_and_atoms: WARNING atom 11 in ligand.mol is type C.2 but should be C.ar check_aromatic_bonds_and_atoms: WARNING bond 17 in ligand.mol is type 1 but should be ar
gold.err is line buffered so errors are logged immediately. If you are running GOLD via the front end, the contents of gold.err will appear in a separate window.
Process File
The file gold.pid records the user, host and process number of the GOLD job. It is deleted when GOLD exits. Its purpose is to stop the user running two GOLD jobs in the same directory. If the machine goes down, or GOLD crashes (very unlikely!) or is killed with signal 9, you will need to remove gold.pid before you can run another GOLD job in the same directory.
If the input protein and ligand files are in SYBYL MOL2 format, GOLD will write out a number of new MOL2 files:
gold_ligand.mol2 is the original ligand datafile with lone pairs added and the sets donor_hydrogens and lone_pairs defined.
gold_protein.mol2 is the original protein datafile with lone pairs added to binding site atoms and the sets donor_hydrogens and lone_pairs defined. The binding site is defined in the set active_atoms
If GOLD is interrupted (see above), the files gold_soln_ligand.mol2 and gold_soln_current_ligand.mol2 may be created.
The GOLD predictions for ligand binding modes are written out in MOL2 format. For example, suppose that structure.mol2 is the name of the original ligand data file. As the GOLD batch job progresses, each GA run solution is written out as gold_soln_structure_n.mol2, where n is the solution number 1,2,3 ...etc. Note that the file gold_soln_structure_1.mol2 is not the best GOLD prediction, it is just the first solution produced by GOLD.
As GOLD runs, symbolic links are created: ranked_structure_1.mol2 will always point to the current top-ranked solution, ranked_structure_2.mol2 will point to the second-best solution, and so on. The current ranking can also be obtained using the command fit_analy.
For example, suppose the current ranking is solution numbers 4, 1, 2 and 3, then:
-rw------- 1 gareth daemon 4766 Oct 25 10:41 gold_soln_structure_1.mol2
-rw------- 1 gareth daemon 4766 Oct 25 10:52 gold_soln_structure_2.mol2
-rw------- 1 gareth daemon 4766 Oct 25 11:04 gold_soln_structure_3.mol2
-rw------- 1 gareth daemon 4766 Oct 25 11:16 gold_soln_structure_4.mol2
lrwx--x--x 1 gareth daemon 25 Oct 25 11:16 ranked_structure_1.mol2 -> ./gold_soln_structure_4.mol2
lrwx--x--x 1 gareth daemon 25 Oct 25 11:16 ranked_structure_2.mol2 -> ./gold_soln_structure_1.mol2
lrwx--x--x 1 gareth daemon 25 Oct 25 11:16 ranked_structure_3.mol2 -> ./gold_soln_structure_2.mol2
lrwx--x--x 1 gareth daemon 25 Oct 25 11:16 ranked_structure_4.mol2 -> ./gold_soln_structure_3.mol2
If the input file format is PDB, GOLD predictions will be written out in the PDB file format. Similarly, if the input file format is MACCS SD, GOLD predictions will be output in that format.
The script fit_analy reads ga.log and ranks solutions by their fitness scores. The various contributions to the total fitness for each solution are shown, together with a summary.
For example:
Gareth [62] [test1cps] [13:34] > fit_analy Rank Mol Fitness VDW int Tors Int VDW ext H int ext 1 4 62.31 3.31 -3.33 23.56 0.00 32.88 2 1 62.01 2.44 -3.32 24.19 0.00 32.65 3 2 60.65 2.80 -3.35 23.28 0.00 32.10 4 3 56.91 3.29 -6.97 29.10 0.00 24.22 Av fit = 60.47 VDW int = 2.96 Tors Int = -4.2425 VDW ext = 25.0325 H Bond int = 0 ext = 30.4625
Thus, solution number 4 is the best GOLD prediction. If structure.mol2 is the name of the original ligand data file, then gold_soln_structure_4.mol2 will contain the predicted binding mode for this GOLD job.
fit_analy is used by the front end to rank solutions when the output window is open.
If more than one ligand has been docked into the protein, the command fit_analy.batch should be used:
$ fit_analy.batch
Molecule Internal External
Rank Name No Fitness VDW Torsion Hbond VDW Hbond
1 sam-MFCD0017875 2 51.99 2.55 -7.83 0.00 36.84 6.61
2 sam-MFCD0017875 3 51.59 1.56 -6.37 0.00 38.64 3.27
3 sam-MFCD0017875 1 50.09 1.48 -5.97 0.00 38.11 2.17
4 sam-MFCD0012325 1 49.24 1.37 -7.23 0.00 39.99 0.11
5 sam-MFCD0012325 3 48.10 2.24 -8.94 0.00 39.41 0.61
6 sam-MFCD0005477 3 47.98 4.20 -6.46 0.00 36.54 0.00
7 sam-MFCD0005271 1 47.82 2.62 -9.56 0.00 39.83 0.00
.......
534 sam-MFCD0002487 1 -1.32 -28.12 -13.76 0.00 29.48 0.04
535 sam-MFCD0003774 1 -66.66 -113.29 -5.19 0.00 37.69 0.00
536 sam-MFCD0003774 2 -67.74 -113.31 -5.16 0.00 36.89 0.00
537 sam-MFCD0000398 3 -72.83 -105.60 -7.23 0.00 29.09 0.00
538 sam-MFCD0000398 2 -74.73 -105.50 -7.36 0.00 27.73 0.00
539 sam-MFCD0000398 1 -79.75 -108.21 -6.92 0.00 25.74 0.00
540 sam-MFCD0005913 1 -89.92 -128.95 -9.59 0.00 35.36 0.00
541 sam-MFCD0005913 2 -92.39 -127.69 -9.84 0.00 32.83 0.00
542 sam-MFCD0005913 3 -96.20 -127.50 -8.87 0.00 29.21 0.00
Av fit = 32.17
VDW int = -1.42 Tors Int = -5.56 VDW ext = 26.57
H Bond int = 0.00 ext = 2.62
grommitt is a simple molecular viewer for examining binding modes. It is used by the front end, when the display variable is set, for showing the current top solution in a GA run.
It can also be used from the command line to display overlays of SYBYL MOL2 files.
Usage: grommitt [-chp] files
The flags are:
files is a list of SYBYL MOL2 (suffix .mol2) and/or Brookhaven PDB files (suffix .pdb or .ent).
grommitt is useful for visualising a set of GOLD solutions: it is possible to see at a glance if all GOLD solutions are identical or whether there are several different binding modes.
For example:
%grommitt -h gold_soln*
displays the window:
Click here for full size image
smart_rms calculates the rms difference between two conformations of the same structure, while taking account of symmetry effects (such as ring flipping).
Using a graph isomorphism algorithm, an RMS score is calculated for each way of mapping the molecule onto itself.
Usage: smart_rms [-hv] conformation_1 conformation_2
The flags are:
conformation_1 and conformation_2 are MOL2 files containing the two conformations. They should only differ in terms of coordinates.
rms_analysis calculates an RMSD matrix for a set of structures (in MOL2 files) and performs hierarchical cluster analysis. A graph isomorphism algorithm is used to determine optimal RMS values.
Usage: rms_analysis [-c n] file1.mol2 file2.mol2 ....
where n is 1 for single linkage cluster analysis, 2 for complete linkage and 3 for group average.
For example:
$ rms_analysis ranked_ligand_?.mol2
________________________
RMSD Matrix of molecules
2 3 4 5 6 7 8 9
1 : 0.8 1.1 1.0 1.0 1.4 2.3 5.0 4.6
2 : 0.9 1.1 1.1 1.2 2.3 5.2 4.6
3 : 0.4 0.8 0.9 2.3 5.0 4.5
4 : 0.6 1.1 2.3 4.9 4.5
5 : 1.3 2.0 4.9 4.5
6 : 1.8 5.1 4.4
7 : 5.3 4.5
8 : 2.4
Clustering using complete linkage.
Dist Clusters...
0.40 | 1 | 2 | 3 4 | 9 | 5 | 6 | 7 | 8 |
0.84 | 1 | 2 | 3 4 5 | 9 | 8 | 6 | 7 |
0.84 | 1 2 | 7 | 3 4 5 | 9 | 8 | 6 |
1.13 | 1 2 3 4 5 | 7 | 6 | 9 | 8 |
1.42 | 1 2 3 4 5 6 | 7 | 8 | 9 |
2.35 | 1 2 3 4 5 6 7 | 9 | 8 |
2.38 | 1 2 3 4 5 6 7 | 9 8 |
5.28 | 1 2 3 4 5 6 7 9 8 |
The following Environment variables are required:
| GOLD_DIR | The directory where GOLD is installed. |
| GROMMITT_DIR | The directory where grommitt is installed. |
The environment variables TK_LIBRARY and TCL_LIBRARY may also be required by Tcl/Tk.
These variables should be set in the file CSHRC or BASHRC in the installation directory, so that the user need only source this file to run GOLD.