GOLD: Correlation of Fitness Score with Activity Data

This page summarises research into the correlation between GOLD fitness scores and measured binding affinities, based on a series of molecules prepared and tested at Glaxo Wellcome and two series from the literature.


Influenza A Neuraminidase

Crystal structure studies available on N2 and N9 subtypes.

GOLD was run 25 times on each complex and the fitness score of the best prediction recorded.

Ligand IC50(uM) Fitness PDB Id RMS Notes
gr215787 0.002 82.02 4
gr217029 0.002 87.39 4
gr209375 0.004 76.99 gs00023 1.13 1
gr121167 0.005 92.96 gs00155 0.38 1
5c 0.005 81.96 4,7
gr245554 0.012 75.52 4
5e 0.014 85.34 4,7
gr209376 0.18 68.93 4
gr121158 0.32 91.59 gs00154 0.32 1
5b 0.32 77.78 4,7
gr207471 0.5 74.40 4
DANA 8.6 74.73 1nnb 1.13 1
DANA 8.6 75.20 1ivf 1.18 2
gr220020 12 76.24 4
EPANA ~10 78.98 1iny 1.25 1
EPANA ~10 77.78 1inx 1.54 2
gr195901 19 67.97 4
Kim1 20 64.14 4,6
Kim2 >100 83.10 5,6
inactive4 >130 58.86 4,8
inactive8 >130 71.94 4,8
inactive10 >180 59.31 4,8
inactive3 >210 66.34 4,8
inactive5 >270 61.80 4,8
inactive7 >390 72.91 4,8
inactive2 >600 75.18 4,8
inactive1 >640 64.80 4,8
inactive6 >880 62.74 4,8
inactive9 >900 63.13 4,8
BANA105 750 44.85 1ivd 2.55 3
APANA ~1000 78.43 1inw 1.28 2
NANA ~1000 80.66 2bat 1.21 2
BANA106 10000 46.39 1ivc 2.30 2,3
BANA108 >20000 41.03 1ive 2.84 2,3
  1. N9 Crystal Structure.
  2. N2 Crystal Structure.
  3. GOLD failed to predict BANA geometries correctly as these compounds contain non-planar amide bonds. However the position of the benzene ring and acid group were correctly predicted.
  4. Docked into protein crystal structure of gs00023.
  5. Docked into protein crystal structure of gs00155.
  6. Structures from Williams et al. Bioorg & Med Chem Lett 1995, 5, 2551
  7. .
  8. Analogues of GG167.
  9. Glaxo Wellcome inactives from assay data. These inactives have high structural similarity to the actives.

From the graph, it can be seen that there are no compounds with low fitness and high activity and there is evidence of correlation.

Non-parametric tests indicate that GOLD score and activity are highly correlated (Spearman test: rs=-.6493, p<0.001; Kendall test: Tau=-.4830, p<0.001).

If we consider 10um to be a cutoff for activity, there are 15 actives and 19 inactives. The question then arises as to whether the GOLD score can be used to predict activity. A GOLD score of greater than 74 will be used to indicate the prediction of activity:

GOLD predicted activity GOLD predicted inactivity
Active 14 1
Inactive 5 14

Thus, GOLD scores are a good indicator of activity for this series. It is most unlikely that this level of prediction could have arisen through chance (Chi2=15.27, p<.001, v=1)


FKBP12

FK506BP ligands from Holt et al., J Am Chem Soc, 1993, 115, 9925.

GOLD was run 20 times on each complex (or until the top three solutions were within 1.5A of each other) and the fitness score of the best prediction recorded.

Ligand Ki(uM) Fitness PDB Id RMS Notes
compound 1 1.6 46.93
compound 2 2 47.89
compound 3 0.6 47.63
compound 4 0.186 56.97
compound 5 0.11 55.98
compound 6 0.012 59.87
compound 7 0.25 60.43
compound 8 0.01 61.51 1fkg 1.43
compound 9 0.007 59.57 1fkh 1.52
compound 10 0.3-0.6 63.98 2
compound 11 0.3 53.25 3
compound 13 0.1 49.56 1fki 0.78 1
  1. GOLD is unable to perform full conformational analysis of this macrocycle, so semi-rigid docking with "corner flapping" was used. For the same reason, GOLD was unable to dock compounds 12, 14 and 15.
  2. Stereoisomer of compound 8.
  3. Stereoisomer of compound 9.

Non-parametric tests indicate that GOLD score and activity are not significantly correlated (Spearman test: rs=-.5639, p=0.056; Kendall test: Tau=-.3817, p=0.086).

Comments

Here, we were unable to show a statistically significant relationship between the GOLD score and activity. However, it is worth noting that all compounds are structurally similar and all are active.


Alpha Chymotrypsin

94 inhibitors from Stewart et al., T. C. Methods, 1990, 3, 713.

Ligand Ki(mM) Fitness
K51 60.0 57.68
K15 130.0 56.91
K21 104.0 54.53
K1 0.08 54.51
K3 13.5 54.07
K5 0.23 52.45
K19 11.0 52.38
K8 0.22 51.94
K11 185.0 51.92
K9 31.0 51.76
K23 0.25 51.57
K14 0.063 51.47
K17 4.8 50.47
K2 0.34 50.35
K7 0.13 50.30
K22 1.84 50.01
K44 9.9 49.72
K13 0.23 49.66
K6 0.22 49.55
K100 400 49.06
K10 0.7 48.94
K27 177.0 48.73
K18 250.0 48.63
K43 1.35 48.05
K64 12.0 47.96
K25 5.6 47.62
K69 11.4 47.01
K26 8.4 46.27
K52 2.7 46.21
K54 7.0 45.52
K42 3.4 45.00
K55 25.0 44.98
K46 2.3 44.68
K28 12.2 44.65
K30 1.3 44.65
K29 1.4 44.27
K75 15.4 43.87
K33 1.5 43.50
K48 0.4 43.46
K24 5.4 43.14
K36 0.25 43.06
K37 7.5 43.05
K70 4.3 43.02
K32 0.7 43.00
K45 1.1 42.98
K35 2.3 42.72
K47 0.2 42.69
K4 0.26 42.56
K31 0.87 42.45
K49 0.77 42.31
K84 10.0 41.31
K67 0.6 41.21
K59 15.0 41.06
K58 7.7 40.97
K63 0.8 40.63
K71 4.3 40.43
K68 200.0 40.40
K57 4.9 40.33
K66 5.0 39.89
K73 70.0 39.86
K56 2.4 39.61
K53 2.02 39.57
K65 0.32 39.52
K80 4.0 39.34
K61 13.0 39.18
K41 6.3 39.16
K40 6.6 39.15
K77 48.0 39.11
K62 1.42 37.49
K83 3.0 37.20
K81 0.8 37.04
K82 1.33 37.01
K74 6.6 36.80
K78 3.9 36.05
K76 150.0 35.93
K87 22.0 35.78
K79 10.0 35.67
K88 5.8 35.36
K86 8.4 34.66
K85 6.3 33.65
K72 3.4 32.52
K93 13.0 31.81
K94 6.4 31.49
K96 75.0 31.01
K95 6.6 30.62
K91 110.0 30.36
K16 41.0 29.85
K89 12.3 29.46
K90 2.9 29.35
K98 25.0 27.79
K97 28.0 26.64
K99 45.0 22.25
K102 4.67 9.11
K103 10.2 -21.92

The next graph ignores the two outliers:

Non-parametric tests indicate that GOLD score and activity are significantly correlated according to the Kendall test but not significantly correlated according to the Spearman test. (Spearman test: rs=-.1909, p=0.065; Kendall test: Tau=-.1495, p=0.033).

Note that these inhibitors are all extremely hydrophobic, representing a worst case for GOLD.


Summary

There is a clear relationship between GOLD fitness scores and binding constants for the neuraminidase inhibitors, though more data is required to assess the strength of the correlations. GOLD was clearly able to distinguish between actives and inactives.

No statistically significant relationship was found for FK506. However, this is a demanding series: all the compounds are active and very similar structurally.

GOLD did not do too badly on the alpha-chymotrypsin test set, given the demanding (hydrophobic) nature of the ligands.

The current fitness function was designed to discriminate between different binding modes of the same molecule. Extra terms are probably required to compare different molecules. For example, a term is probably required to account for the entropic loss associated with "freezing" rotatable bonds when the ligand binds.


up Back to main GOLD page

User Support