Speeding up GOLD


Introduction

This page summarises some experiments that were done with the parameterisation of GOLD in order to decrease run-time, while trying to maintain a high level of performance in terms of prediction of binding modes.

GOLD runs for a fixed number of genetic operations (crossover and mutation). The easiest way to make GOLD go faster is to reduce the number of GA operations performed in the course of a run. This is done through the maxops variable in the configuration file (called No Operations in the front-end). However, a reduction in this parameter is likely to change the optimum values of several other GA parameters, and in a somewhat complex way.

GOLD manipulates a pool of chromosomes. The size of this pool should be such that the optimisation converges within maxops operations. If the pool size is too small for a given value of maxops, the algorithm will converge prematurely. Conversely, if the pool size is too large the algorithm will terminate before it has converged. In GOLD, the chromosome pool consist of n_islands populations, each of size popsize (both of these variables can be defined in the front end). Thus, the optimum values of both of these variables are dependent on maxops.

Two other important parameters are start_vdw_linear_cutoff and initial_virtual_pt_match_max. As a GOLD run proceeds, certain parameters within the fitness function are annealed. Thus, at the start of a GOLD GA run, external VDW energies are cut off when Eij > start_vdw_linear_cutoff*kij, where kij is the depth of the VDW well between atoms i and j. At the end of the run, the cut-off value is FINISH_VDW_LINEAR_CUTOFF. This allows a few bad bumps to be tolerated at the beginning of the run, in the expectation that they will be removed as the optimisation proceeds. Similarly, the parameters initial_virtual_pt_match_max and FINAL_VIRTUAL_PT_MATCH_MAX are used to set starting and finishing values of max_distance (the distance between donor hydrogen and fitting point must be less than max_distance for the bond to count towards the fitness score). This annealing process allows poor hydrogen bonds to occur at the beginning of a GA run, in the expectation that they will evolve to better solutions.

Both the vdw and H-bond annealing must be gradual and the population allowed plenty of time to adapt to changes in the fitness function.

In summary, the easiest way to make GOLD run faster is to reduce maxops in the configuration file. However, this should be done with appropriate changes to the variables n_islands, popsize, start_vdw_linear_cutoff and initial_virtual_pt_match_max. In order to readily allow this, the variables start_vdw_linear_cutoff and initial_virtual_pt_match_max are set in the configuration file rather than in the parameter file.


Comparison

The full parameterisation was compared with 14 reduced parameterisations. The following is a summary of GOLD runs over 100 test complexes using the fifteen sets of parameters:

CPU Time (s) Cumulative count
at RMSD
Parameterisation (R4000) (R10000) Avg #Docks Avg Fitness Loss 2.0 3.0
Full 919 7.76 N/A 67 77
Reduced 1 506 8.21 1.92 70 79
Reduced 2 314 8.58 4.67 66 79
Reduced 3 169 9.84 8.36 61 73
Reduced 4 98 8.96 5.86 67 77
Reduced 5 85 9.09 8.02 67 75
Reduced 6 71 10.47 7.98 63 77
Reduced 7 60 9.89 8.57 62 73
Reduced 8 96 8.88 5.92 63 78
Reduced 9 70 9.69 8.14 67 76
Reduced 10 98 9.71 6.35 66 74
Reduced 11 71.6 9.92 8.41 66 73
Reduced 12 60 10.79 9.86 63 72
Reduced 13 98 9.28 6.36 62 71
Reduced 14 110 9.21 6.28 65 77

Note: The R10000 CPU is approximately 3 times faster than the R4000. Reduced_13 and Reduced_2 should require the same amount of CPU time.

GOLD terminates if the top 3 solutions are within 1.5 Angstrom RMSD, so the average number of docks is an indication of the reproducibility of the GOLD predictions.

The average fitness loss is the average fitness difference across the test set of 100 complexes when the results of the reduced parameter set are compared with those from the default (full) parameterisation. Although not shown by the above table, the difference in fitness scores is most marked for the more complex ligands.

The cumulative count at RMSD 2.0 and 3.0 Angstrom is the number of test systems for which the GOLD prediction is within 2.0 or 3.0 Angstrom RMSD of the experimental result.

A more detailed breakdown of the results obtained from each parameter set is now given:

Results with Full Parameterisation

Summary of GOLD run over 100 test complexes.

Average Execution Time 919s
Avg no of docks 7.76
maxops 100000
n_islands 5
start_vdw_linear_cutoff 2.5
initial_virtual_pt_match_max 4.0

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under 0.5 7 7
0.5->1.0 38 45
1.0->1.5 11 56
1.5->2.0 11 67
2.0->2.5 8 75
2.5->3.0 2 77
3.0->3.5 3 80
over 3.5 20 100

Results with Reduced Parameterisation 1

Summary of GOLD run over 100 test complexes.

Average Execution Time 506s
Avg no of docks 8.21
Avg fitness loss 1.92
maxops 50000
n_islands 5
start_vdw_linear_cutoff 6.0
initial_virtual_pt_match_max 3.0

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under 0.5 12 12
0.5->1.0 34 46
1.0->1.5 14 60
1.5->2.0 10 70
2.0->2.5 7 77
2.5->3.0 2 79
3.0->3.5 2 81
over 3.5 19 100

Results with Reduced Parameterisation 2

Summary of GOLD run over 100 test complexes.

Average Execution Time 314s
Avg no of docks 8.58
Avg fitness loss 4.67
maxops 30000
n_islands 3
start_vdw_linear_cutoff 6.0
initial_virtual_pt_match_max 3.0

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under 0.5 8 8
0.5->1.0 34 42
1.0->1.5 13 55
1.5->2.0 11 66
2.0->2.5 9 75
2.5->3.0 4 79
3.0->3.5 5 84
over 3.5 16 100

Results with Reduced Parameterisation 3

Summary of GOLD run over 100 test complexes.

Average Execution Time 169s
Avg no of docks 9.84
Avg fitness loss 8.36
maxops 15000
n_islands 2
start_vdw_linear_cutoff 7.5
initial_virtual_pt_match_max 3.0

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under 0.5 14 14
0.5->1.0 25 39
1.0->1.5 17 56
1.5->2.0 5 61
2.0->2.5 8 69
2.5->3.0 4 73
3.0->3.5 3 76
over 3.5 24 100

Results with Reduced Parameterisation 4

Summary of GOLD run over 100 test complexes.

Average Execution Time 98s
Avg no of docks 8.96
Avg fitness loss 5.86
maxops 30000
n_islands 2
start_vdw_linear_cutoff 6.0
initial_virtual_pt_match_max 3.0

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 11 11
0.5->1.0 36 47
1.0->1.5 12 59
1.5->2.0 8 67
2.0->2.5 8 75
2.5->3.0 2 77
3.0->3.5 4 81
over_3.5 19 100

Results with Reduced Parameterisation 5

Summary of GOLD run over 100 test complexes.

Average Execution Time 85.4s
Avg no of docks 9.09
Avg fitness loss 8.02
maxops 30000
n_islands 1
start_vdw_linear_cutoff 6.0
initial_virtual_pt_match_max 3.0

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 9 9
0.5->1.0 36 45
1.0->1.5 15 60
1.5->2.0 7 67
2.0->2.5 5 72
2.5->3.0 3 75
3.0->3.5 4 79
over_3.5 21 100

Results with Reduced Parameterisation 6

Summary of GOLD run over 100 test complexes.

Average Execution Time 70.9s
Avg no of docks 10.47
Avg fitness loss 7.98
maxops 20000
n_islands 2
start_vdw_linear_cutoff 6.0
initial_virtual_pt_match_max 3.0

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 13 13
0.5->1.0 28 41
1.0->1.5 16 57
1.5->2.0 6 63
2.0->2.5 5 68
2.5->3.0 6 74
3.0->3.5 3 77
over_3.5 23 100

Results with Reduced Parameterisation 7

Summary of GOLD run over 100 test complexes.

Average Execution Time 60.1s
Avg no of docks 9.89
Avg fitness loss 8.57
maxops 20000
n_islands 1
start_vdw_linear_cutoff 6.0
initial_virtual_pt_match_max 3.0

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 8 8
0.5->1.0 31 39
1.0->1.5 17 56
1.5->2.0 6 62
2.0->2.5 9 71
2.5->3.0 2 73
3.0->3.5 3 76
over_3.5 24 100

Results with Reduced Parameterisation 8

Summary of GOLD run over 100 test complexes.

Average Execution Time 96.4s
Avg no of docks 8.88
Avg fitness loss 5.92
maxops 30000
n_islands 2
start_vdw_linear_cutoff 6.0
initial_virtual_pt_match_max 2.5

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 12 12
0.5->1.0 31 43
1.0->1.5 16 59
1.5->2.0 4 63
2.0->2.5 8 71
2.5->3.0 5 76
3.0->3.5 2 78
over_3.5 22 100

Results with Reduced Parameterisation 9

Summary of GOLD run over 100 test complexes.

Average Execution Time 70.6s
Avg no of docks 9.69
Avg fitness loss 8.14
maxops 20000
n_islands 2
start_vdw_linear_cutoff 6.0
initial_virtual_pt_match_max 2.5

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 9 9
0.5->1.0 38 47
1.0->1.5 9 56
1.5->2.0 11 67
2.0->2.5 8 75
2.5->3.0 1 76
3.0->3.5 3 79
over_3.5 21 100

Results with Reduced Parameterisation 10

Summary of GOLD run over 100 test complexes.

Average Execution Time 98.4s
Avg no of docks 9.71
Avg fitness loss 6.35
maxops 30000
n_islands 2
start_vdw_linear_cutoff 7.5
initial_virtual_pt_match_max 3.5

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 12 12
0.5->1.0 28 40
1.0->1.5 18 58
1.5->2.0 8 66
2.0->2.5 7 73
2.5->3.0 1 74
3.0->3.5 5 79
over_3.5 21 100

Results with Reduced Parameterisation 11

Summary of GOLD run over 100 test complexes.

Average Execution Time 71.61s
Avg no of docks 9.92
Avg fitness loss 8.41
maxops 20000
n_islands 2
start_vdw_linear_cutoff 7.5
initial_virtual_pt_match_max 2.5

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 9 9
0.5->1.0 33 42
1.0->1.5 12 54
1.5->2.0 12 66
2.0->2.5 3 69
2.5->3.0 4 73
3.0->3.5 3 76
over_3.5 24 100

Results with Reduced Parameterisation 12

Summary of GOLD run over 100 test complexes.

Average Execution Time 60.44s
Avg no of docks 10.79
Avg fitness loss 9.86
maxops 20000
n_islands 1
start_vdw_linear_cutoff 7.5
initial_virtual_pt_match_max 2.5

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 8 8
0.5->1.0 37 45
1.0->1.5 11 56
1.5->2.0 7 63
2.0->2.5 6 69
2.5->3.0 3 72
3.0->3.5 3 75
over_3.5 25 100

Results with Reduced Parameterisation 13

Summary of GOLD run over 100 test complexes.

Average Execution Time 97.97s
Avg no of docks 9.28
Avg fitness loss 6.36
maxops 30000
n_islands 2
start_vdw_linear_cutoff 7.5
initial_virtual_pt_match_max 2.5

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 14 14
0.5->1.0 26 40
1.0->1.5 16 56
1.5->2.0 6 62
2.0->2.5 7 69
2.5->3.0 2 71
3.0->3.5 6 77
over_3.5 23 100

Results with Reduced Parameterisation 14

Summary of GOLD run over 100 test complexes.

Average Execution Time 109.51s
Avg no of docks 9.21
Avg fitness loss 6.28
maxops 30000
n_islands 3
start_vdw_linear_cutoff 7.5
initial_virtual_pt_match_max 2.5

Summary of RMSD with crystal structure.

RMSD Count Cumulative Count
under_0.5 11 11
0.5->1.0 32 43
1.0->1.5 17 60
1.5->2.0 5 65
2.0->2.5 5 70
2.5->3.0 7 77
3.0->3.5 2 79
over_3.5 21 100

Comments

It does seem that considerable speed-up can be achieved, though always at some cost to the final fitness score and reproducibility of results. However, reduced parameterisation 1 does offer almost 100% speed-up with few adverse effects. If CPU usage is of overriding importance, further speed-up can be achieved, though at the expense of the quality of binding-mode predictions. Reduced parameterisation 2 allows a 3-fold speed-up, though the quality of GOLD results is clearly affected. Other parameterisations allow yet more speed-up with correspondingly poorer predictions. Of the parameterisations that allow 4- to 5-fold improvements in speed, parameterisations 3, 6 and 9 are better than most.

In order to allow the user to balance CPU requirements against the quality of results, four default parameterisations are available through the front end. These correspond to parameterisations full, 1, 2 and 6.


up Back to the main GOLD page.

User Support