These pages describe how to get started with GOLD and contain introductory notes on using the program to predict a ligand binding mode.
In order to use the program you need to set some environment variables. To do this, source the file DIR/CSHRC, where DIR is the directory in which GOLD is installed.
% source DIR/CSHRC
The GOLD commands can now be accessed from the command line. GOLD can be run from the command line or via the graphical front end. The easiest way to get started is to use the front end.
Amongst other formats, the software can read and write SYBYL MOL2 files. Two files are required, one specifying the ligand, the other defining the protein binding site. You can also specify the problem using a PDB file for the protein (see Brookhaven PDB files) and either a PDB or MDL SD file for the ligand.
GOLD uses an all-atom model, so the input files must have all hydrogens added. Additionally, the protein and ligand should be in reasonably low-energy conformations, since GOLD will not alter bond lengths or angles, or rotate "rigid" bonds such as amide linkages, double bonds and certain bonds to trigonal nitrogens. GOLD deduces hydrogen-bonding abilities and the ionisation states of the protein and the ligand from the presence or absence of hydrogens, together with the bond and atom typing of the input structures. If the wrong ionisation state is input to the program, it is highly unlikely that GOLD will be able to predict the correct protein-ligand binding mode. Similarly, if the wrong atom or bond types are specified, GOLD will probably give the wrong answer. Therefore, you should read the page on:
You do not have to input the whole protein. However, if you input only the region of interest around the binding site, you should make sure that all the residues you include are complete.
Additionally, the approximate location and size of the protein binding site is required. This is specified by a reference position in approximately the centre of the binding-site cavity, together with a radius around the position which should be sufficient to include the whole of the binding site. (However, the problem may become intractable if the defined site is too large.) The reference position can be identified by explicitly giving the coordinates of a point in the cavity, or by specifying a solvent-exposed protein atom.
A cavity detection algorithm (Delaney, J. Mol. Graph., 10 (1992) 174-177 or Hendlich, Rippmann and Barnickel, LIGSITE: Automatic and efficient detection of potential small molecule binding sites in proteins, Merck technical report, 1997) is used to restrict the region of interest to concave solvent-accessible surfaces.
Because GOLD opens output log files, each GOLD run should be performed in a separate directory. Create a directory in which to run GOLD and copy the protein and ligand files into it.
The problem details are passed to GOLD using a configuration file. This file can be created manually, but it is usually easier and preferable to create it with the GOLD graphical front end. To start the front end, move to the directory containing the input protein and ligand files and type the command gold.
![]() | The program uses a large number of user-configurable parameters, but most have default values which have proved to be robust in tests on many different complexes. The front end is initialised with these defaults. There are other sets of default parameterisations which can be used: these make the program run faster, but may result in poorer quality results. A menu of defaults can be accessed through the Default button. |
The problem details should now be input to the interface. First, specify the Protein and Ligand datafiles.
Adding a Ligand.
| To add a ligand datafile, click on the Add/Delete Ligand button. |
| The Ligand Editor window will appear. |
![]() | Click on the Add button, and the Add Ligand dialog box will appear. |
| Hit the Filename button, then use the file-selection window to choose the ligand data file. Once you have finished, the selected filename will appear in the entry box in the Add Ligand window next to the Filename button. |
The slider in the Add Ligand window is used to select the number of times the ligand is to be docked. GOLD uses a non-deterministic algorithm to predict binding modes (i.e. it is likely to produce a number of different answers when it is run several times on the same problem). It is therefore recommended that several dockings are performed. The results are ranked using the GOLD fitness score (solutions of higher fitness are considered superior to those of lower fitness). For most ligands, 10 dockings should be sufficient. However, if a ligand is particularly large or flexible, 25 dockings may be more appropriate.
| In the Add Ligand window, click on the Add button to complete the selection of the ligand. The ligand will now appear in the Ligand Editor window. |
You can now repeat the procedure in the Add Ligand window if you want to select further ligands for docking into the protein; otherwise click on the End button to finish input. In the Ligand Editor window, you can click on OK if you are satisfied with the selected ligands, otherwise you can select a ligand with the mouse and delete it, or add further ligands. When you finish, the count of ligands will be updated in the front end.
Picking the Protein Datafile.
|
| Click on this button to pick the protein input file. The file selection window will appear. |
![]() |
Use the file selection window to choose the protein data file. When you have finished, the filename will appear in the entry box next to the protein button in the GOLD front end. |
Defining the Binding Site.
Now you need to select a method for defining the extent of the binding site. The binding site is normally defined by its centre and its radius. The centre can be specified using a solvent accessible point inside the binding-site cavity or a solvent accessible atom. Alternatively, the atoms comprising the binding site can be given explicitly in a file.
![]() | Use these radio buttons to select your method. |
![]() | If you chose Point, enter the coordinates of the point in these boxes. |
| If you chose Atom, enter the atom number or PDB entry number of the atom here. |
| If you chose File, enter the name of a file which contains a list of atom numbers, each on a separate line. These atoms should all be solvent accessible and they explicitly define the binding site (therefore, all acceptors and donor hydrogens available to the ligand are taken from the list). |
![]() |
Enter the approximate size of the binding site here. This should be large enough to contain any possible binding mode of the ligand. |
| The problem definition is now complete. Hit the Run button to start the program. |
A second window will now be created which contains output from the program. If any errors are encountered, a third window will be opened giving details of the error.
The program takes approximately 10-15 minutes (on an SGI R4400) to dock a typical ligand. The run time for a batch of 10 dockings can thus amount to a couple of hours. Clearly, a GOLD job which may take hours of CPU time is better run in the background. You can this by using the Submit&Exit button rather than the Run button. You can also configure GOLD to terminate early if the top-ranked solutions are very similar to one another. (Experience shows that, in this situation, the program has probably already found the correct answer.)
Alternative parameterisations are available which will make GOLD run faster: click on the Default button to get a menu of alternative parameterisations. Although these other settings make GOLD run faster, they will reduce the quality of the predicted dockings. For more information, see this page.
| Once the program has finished, the different dockings can be ranked according to their fitness scores using the Rank Solutions button on the output window. |
The ligand conformations corresponding to GA solutions are output as SYBYL MOL2 files into the output directory (if the input ligand file is in SD format, then output files will also be in SD format). If the ligand is in file ligand.mol2, GOLD solutions are output as gold_soln_ligand_1.mol2, gold_soln_ligand_2.mol2, gold_soln_ligand_3.mol2 ... etc. If the GA has predicted solution number 5 as the top ranked solution, then the predicted binding mode will be found in the file gold_soln_ligand_5.mol2.
As GOLD runs, symbolic links of the form ranked_ligand_NN.mol2 are created. The link ranked_ligand_1.mol2 points to the current top solution, the link ranked_ligand_2.mol2 points to the current second-best solution, and so on. For example, if solution number 5 is the best, then ranked_ligand_1.mol2 will be a symbolic link to gold_soln_ligand_5.mol2. As GOLD runs and outputs more solutions, thereby changing the ranking, these links are updated. At the end of a GOLD run, the predicted binding mode will therefore be found in ranked_ligand_1.mol2
The commands fit_analy, grommitt and smart_rms are useful for comparing the different solutions generated by a single GOLD run.
If something has gone wrong, the first thing to do is check for error messages: these are logged in the file gold.err. Also, look at the log file (gold.log) for any unusual messages. Both these files appear in separate windows when running the program from the front end.
Also check that the protein binding site has been properly defined. The file gold_protein.mol2 is a SYBYL file with the binding site defined as a SYBYL set and coloured green. Check that the binding site is correct and not unnecessarily large.
Finally, check that the protein and ligand atom typing is correct.