GOLD Validation: Results on Initial Test Set of 100 Protein-Ligand Complexes

These results are described in G. Jones, P. Willett, R. C. Glen, A. R. Leach & R. Taylor, J. Mol. Biol 267 (1997) 727-748 and were first presented at Molecular Interactions, the 15th Annual Conference and AGM of the Molecular Graphics and Modelling Society, University of York, UK, 16-19 April 1996.


In order to reduce the size of the GOLD distribution directories, we have not included details of the individual predictions. However, these can be viewed in 3D on the Web pages of the Cambridge Crystallographic Data Centre. To view in 3D, you will need to download the Chime plug-in from the US MDL Web site or the uk site. The GOLD prediction is shown, together with the crystallographically observed position and conformation, both encoded in MDL MOL format. Coordinates of the test systems are available for downloading, so other groups are free to test their docking software on the same complexes. All file formats are SYBYL MOL2. Thanks are due to the Brookhaven PDB Group for allowing distribution of these data.


The Data Set

100 PDB protein-ligand complexes.

Initially, GOLD failed on a number of the test complexes because, for example, the ligand had insufficient hydrogen-bonding atoms or the protein included a metal ion. These problems were solved, and GOLD eventually produced an answer for 99 of the test complexes, failing only on 1ACL, where the ligand has no H-bond donors or acceptors at all.


Results of GOLD's Predictions

GOLD achieved a 71% rate of successful predictions.

In summarising the results, the GOLD prediction is defined as the best of the 20 GA dockings according to the GOLD fitness score and not the docking that is closest to the experimental result.

Position of top ranked molecule (for those complexes where the GA produced an answer)

Each GOLD prediction was assigned to one of 4 subjective categories: good, close, errors or wrong.

Remember, details of individual predictions are available on CCDC's Web page.

First set of validation tests: Subjective classification of results
Subjective Result No PDB Codes
Good 41 1ABE 1ACM 1ACO 1CBX 1COY 1CPS 1DBB 1DBJ 1FKG 1FKI 1HDY 1HEF 1HYT 1LST 1MDR 1MRK 1PBD 1PHD 1POC 1SRJ 1STP 1TPP 1ULB 1XIE 2ADA 2CGR 2CHT 2CTC 2PHH 2SIM 3AAH 3PTB 3TPI 4DFR 4PHV 7TIM 8GCH 1AEC 1AHA 1ASE 1HSL
Close 30 1BLH 1DIE 1DR1 1DWD 1EPB 1GHB 1GLQ 1IDA 1IVE 1LDM 1PHA 1PHG 1RNE 1SLT 1TKA 1TMN 1XID 2DBL 2PK4 2YHX 3CPA 3GCH 3HVT 4CTS 5P2P 6ABP 6RNT 1APT 1AZM 4EST
Partially correct but significant errors 9 1BAF 1EAP 1ETR 1HDC 1LIC 1RDS 1ROB 6RSA 1ACK
Wrong 19 1AAQ 1ACJ 1DID 1EED 1ETA 1HRI 1ICN 1IGJ 1MCR 1MUP 2R07 1NIS 1TDB 2AK3 2MTH 2PLV 3CLA 4FAB 2MCP

Results by RMSD

The table below shows the relationship between the subjective classification used above and a more objective measure: the RMSD between the GOLD prediction and the crystallographically determined coordinates.

First set of validation tests: Summary of RMSD results
RMS #Total #Good #Close #Errors #Wrong
<=0.5 8 8 0 0 0
>0.5, <=1.0 27 24 3 0 0
>1.0, <=1.5 20 7 13 0 0
>1.5, <=2.0 11 2 9 0 0
>2.0, <=2.5 2 0 2 0 0
>2.5, <=3.0 3 0 2 1 0
>3.0 28 0 1 8 19

Here is the full table of RMSD results

Results by ligand composition

This table shows how the subjective result varies with the number of ligand atoms, the percentage of polar atoms in the ligand, and the number of ligand rotatable bonds and free corners.

First set of validation tests: Ligand characterisation
Subjective Result Number of heavy atoms % of heavy atoms which can form hydrogen bonds Number of torsions
Max Avg Min Max Avg Min Max Avg Min
Good and close 52 20.4 6 66.7 31.9 8.7 28 7.9 0
Errors and wrong 55 24.3 9 53.9 25.1 4.8 40 11.4 0

Results by resolution of protein structure

This table shows that GOLD is more likely to fail if the protein structure is of poor resolution. This is an interesting result, since it suggests that some of the discrepancies between the GOLD predictions and the observed ligand positions are due to experimental errors in the PDB.

First set of validation tests: Resolution of PDB complexes
Resolution #Total #Good + #Close #Errors + #Wrong
>1.0, <=1.5 2 2 0
>1.5, <=2.0 44 34 10
>2.0, <=2.5 32 24 8
>2.5, <=3.0 20 11 9
>3.0 1 0 1

Problems with the GOLD Algorithm

The above tests highlighted a number of problems, most of which have now been solved:

One ongoing problem is speed: GOLD is not the fastest algorithm available. However, there is, as always, a trade-off between speed and reliability. A number of faster GA parameter settings have been developed for those who wish to sacrifice some reliability for increased speed.

up Back to the main GOLD page

User Support