GOLD: Files and Programs

  • GOLD executables
  • Front end files
  • Output files
  • Fit_analy
  • Grommitt
  • Smart_RMS
  • RMS_analysis
  • Environment variables

  • All GOLD files are in the GOLD_DIR directory (see environment variables). The main GOLD executables and the GOLD front-end executable are in the directory GOLD_DIR/gold; some unsupported utilities (of which we only describe fit_analy, grommitt, smart_rms and rms_analysis) are in GOLD_DIR/utils.

    GOLD Executables

    GOLD. The main GOLD binaries are in files gold_ARCH, where ARCH is R4000, R5000, R8000 or R10000, for SGI R4000, R5000, R8000 and R10000 computers. The gold_ARCH command takes a configuration file as an argument. This is a text file which specifies the calculation that is to be run, including details of the ligand, the protein binding site and the GA parameters. Although the file can be generated with a standard text editor, the easiest way to create it is to use the front end. The configuration file can have any name, but by convention is normally called gold.conf. Once a configuration file has been created (e.g. as gold.conf), GOLD can be run in the background - for example, on an SGI R4000 - with the command:

    % gold_R4000  gold.conf & 
    

    Solutions can be ranked using fit_analy.

    Handling Interrupts

    If GOLD receives an interrupt (signal 2), the program will exit cleanly: the current top solution will be output in file gold_soln_ligand.mol2 and the PID file (see below) will be removed.

    Sending GOLD the signal USR1 (using the command kill -USR1 <GOLD process id>) will result in GOLD outputting the current top solution to the file gold_soln_current_ligand.mol2, then continuing execution.


    The Front End

    The front end is a Tcl/Tk script which provides the user with an easy-to-use graphical interface to the program. The script is contained in the file gold_front. It is normally invoked by the command gold, which is a simple shell script that determines the computer architecture and calls gold_front.


    Output Files

    Log File

    The progress of a GOLD batch job is recorded in the file gold.log. This file contains information about the initialisation of the protein and ligand, the progress of each docking run and the solutions generated by the software.

    The file gold.log is line buffered, so you can see how the algorithm is progressing even when GOLD is run in the background. The output window in the front end is simply the current contents of gold.log

    Initially, the log file details the parameterisation. The protein is then loaded and the binding site determined. The (first) ligand is then loaded and the GA runs commence. The progress of each GA run is listed:

    ** Docking run number  1 **
    
    Rank fitness base is 5000.00 increment is 11.2233
    
    Initialising population(s) ...
    Pop:  1 size:  100 #dups:    0 #niche:     0 #errs:     1 
    Pop:  2 size:  100 #dups:    0 #niche:     1 #errs:     2 
    Pop:  3 size:  100 #dups:    0 #niche:     0 
    Pop:  4 size:  100 #dups:    0 #niche:     1 
    Pop:  1 size:  100 #dups:    0 #niche:     0 #errs:     1 
    Pop:  2 size:  100 #dups:    0 #niche:     1 #errs:     2 
    Pop:  3 size:  100 #dups:    0 #niche:     0 
    Pop:  4 size:  100 #dups:    0 #niche:     1 
    Pop:  5 size:  100 #dups:    0 #niche:     0 
    
    Doing GA no population(s) 5 size 100 selection pressure 1.100000
    
                          VDW energies     ligand h-bonds
    Pop  Opcnt Fitness Internal  External Internal External
      2      0   49.12  -18.31     38.07     0.00     0.43 
      5    270   50.20  -26.26     40.19     0.00     0.19 
      5    975   50.49  -26.26     40.56     0.00     0.08 
      3   1293   53.16  -15.63     41.04     0.00     0.00 
      2   2817   52.37  -18.77     40.95     0.00     0.25 
      1   2871   52.68  -15.58     40.16     0.00     0.93 
      5   3190   55.02  -18.08     42.74     0.00     0.37 
      4   3719   56.23  -16.85     43.54     0.00     0.28 
      2   4017   56.23  -18.31     43.72     0.00     0.46 
      4   4454   57.48  -16.85     44.33     0.00     0.53 
      4   5279   57.09  -13.75     43.61     0.00     0.53 
      4   6079   56.40  -13.75     43.07     0.00     0.74 
      2   7887   55.87  -18.31     43.81     0.00     0.68 
      2   8357   58.00  -14.99     44.94     0.00     0.44 
      4   8419   58.10  -17.92     45.34     0.00     0.80 
      1   9056   58.69  -15.13     45.43     0.00     0.68 
      1   9981   58.84  -15.16     45.61     0.00     0.68 
      5  12760   58.06  -15.38     45.81     0.00     0.34 
      5  12770   58.11  -14.65     45.42     0.00     0.67 
      4  13814   58.07  -12.62     45.06     0.00     0.63 
      3  14653   58.01  -12.63     45.23     0.00     0.52 
      4  14829   58.03  -15.65     45.97     0.00     0.66 
      4  14989   58.03  -12.63     45.25     0.00     0.52 
      4  16039   57.50  -12.63     45.09     0.00     0.52 
      3  16288   57.51  -12.71     45.13     0.00     0.50 
      3  16883   57.19  -12.71     44.99     0.00     0.49 
      5  17495   57.36  -14.65     45.82     0.00     0.43 
      5  17835   57.30  -14.65     45.72     0.00     0.65 
      4  17979   57.34  -14.65     45.75     0.00     0.65 
    Reordering...
      3  21000   55.54  -12.63     44.57     0.00     0.48 
      3  21563   55.50  -12.93     44.76     0.00     0.47 
      3  21903   55.56  -12.93     44.81     0.00     0.47 
      4  23654   55.00  -12.93     44.82     0.00     0.46 
      3  23913   55.01  -12.81     44.78     0.00     0.46 
    
    ......................
    
      2  48142   49.81  -12.31     44.84     0.00     0.46 
      5  49320   49.81  -12.31     44.84     0.00     0.46 
    
    #ops:      50000 #mutates:  23758 #cross:    23754 #migrate:   2488 
    #dups:     18541 #niche:    35159 #errors:      71 
    
    Ligand is in gold_soln_ligand_1.mol2
    Fitness  49.81
    Energies   1.68 (int VDW) -13.99 (int tor) 44.84 (ext VDW) H bond  0.00 (int)  0.46 (ext)
    
    Ligand acceptor     3 Protein hydrogen 1229 bond energy  2.91 wt 0.14
    Ligand acceptor     5 
    Ligand acceptor     8 
    Ligand acceptor    10 
    Ligand acceptor    22 
    Ligand acceptor    24 Protein hydrogen 1022 bond energy  0.40 wt 0.11
    
    Current Ranking  1
    
    

    The best ("most fit") individual at any time is listed. The total fitness and steric and hydrogen bond components are also displayed. The internal VDW energy includes a torsional component. The external VDW energy is normally scaled by a factor of 1.375 and summed with the other components to give the total fitness.

    Some parameters involved in determining the fitness score are annealed as the algorithm progresses (the rationale is that bad contacts and poor hydrogen bonds are allowed at the beginning of a GA run in the hope that they will evolve into good solutions). Thus, the total fitness score of the best individual may decrease over the course of a run. The "Reordering..." message refers to a re-ranking of the GA populations caused by the process of annealing parameters.

    At the end of the GA run, the solution is output and summarised.

    Following all GA runs, the solutions from different runs are compared:

    Final Ranking  4  2  5  1  3 
    _______________________________
    RMSD Matrix of RANKED solutions
    
    
          2    3    4    5  
    
     1 :  4.8  4.7  5.1 10.1
     2 :       4.0  3.1 10.9
     3 :            4.1 10.4
     4 :                11.0
    
    Clustering using complete linkage.
    Structure ids are RANKING
    
    Dist  Clusters...
     3.14 |  4  2 |  3 |  5 |  1 |
     4.06 |  4  2  3 |  1 |  5 |
     5.07 |  4  2  3  1 |  5 |
    10.95 |  4  2  3  1  5 |
    
    Finished Docking Ligand ligand.mol
    

    In this case, solution number 4 had the largest fitness score (this solution will be in gold_soln_ligand_4.mol2, which will be symbolically linked to ranked_ligand_1.mol2), while solution number 3 had the worst fitness. The RMSD matrix shows how different the various solutions are. This matrix uses rankings rather than run numbers. For example, ranked structures 2 and 4 have an RMSD of 3.1 Angstrom: these structures are in the files ranked_ligand_2.mol2 and ranked_ligand_4.mol2 (or, equivalently, gold_soln_ligand_2.mol2 and gold_soln_ligand_1.mol2). The RMSD algorithm takes account of symmetry effects, using a graph isomorphism algorithm.

    Finally, the RMSD values are used to perform a hierarchical cluster analysis, using the complete linkage algorithm. Each line shows one iteration of the clustering algorithm, the minimum distance between clusters at that step, and the resulting clusters. Clusters are separated by the '|' symbol and rankings are used rather than run numbers.

    Error File

    The file gold.err contains any errors found by the program. These are generally fatal and cause the program to stop. It is a good idea to check gold.err if something appears to go wrong.

    Errors found by the atom-type checker are written to gold.err. If you are unsure about your atom typing you should therefore check gold.err. For example:

    check_aromatic_bonds_and_atoms:
     WARNING atom    6 in ligand.mol is type   C.2 but should be  C.ar
    check_aromatic_bonds_and_atoms:
     WARNING bond    8 in ligand.mol is type   2 but should be  ar
    check_aromatic_bonds_and_atoms:
     WARNING atom   11 in ligand.mol is type   C.2 but should be  C.ar
    check_aromatic_bonds_and_atoms:
     WARNING bond   17 in ligand.mol is type   1 but should be  ar
    

    gold.err is line buffered so errors are logged immediately. If you are running GOLD via the front end, the contents of gold.err will appear in a separate window.

    Process File

    The file gold.pid records the user, host and process number of the GOLD job. It is deleted when GOLD exits. Its purpose is to stop the user running two GOLD jobs in the same directory. If the machine goes down, or GOLD crashes (very unlikely!) or is killed with signal 9, you will need to remove gold.pid before you can run another GOLD job in the same directory.

    SYBYL MOL2 Solution Files

    If the input protein and ligand files are in SYBYL MOL2 format, GOLD will write out a number of new MOL2 files:

    gold_ligand.mol2 is the original ligand datafile with lone pairs added and the sets donor_hydrogens and lone_pairs defined.

    gold_protein.mol2 is the original protein datafile with lone pairs added to binding site atoms and the sets donor_hydrogens and lone_pairs defined. The binding site is defined in the set active_atoms

    If GOLD is interrupted (see above), the files gold_soln_ligand.mol2 and gold_soln_current_ligand.mol2 may be created.

    The GOLD predictions for ligand binding modes are written out in MOL2 format. For example, suppose that structure.mol2 is the name of the original ligand data file. As the GOLD batch job progresses, each GA run solution is written out as gold_soln_structure_n.mol2, where n is the solution number 1,2,3 ...etc. Note that the file gold_soln_structure_1.mol2 is not the best GOLD prediction, it is just the first solution produced by GOLD.

    As GOLD runs, symbolic links are created: ranked_structure_1.mol2 will always point to the current top-ranked solution, ranked_structure_2.mol2 will point to the second-best solution, and so on. The current ranking can also be obtained using the command fit_analy.

    For example, suppose the current ranking is solution numbers 4, 1, 2 and 3, then:

    -rw-------    1 gareth   daemon      4766 Oct 25 10:41 gold_soln_structure_1.mol2
    -rw-------    1 gareth   daemon      4766 Oct 25 10:52 gold_soln_structure_2.mol2
    -rw-------    1 gareth   daemon      4766 Oct 25 11:04 gold_soln_structure_3.mol2
    -rw-------    1 gareth   daemon      4766 Oct 25 11:16 gold_soln_structure_4.mol2
    lrwx--x--x    1 gareth   daemon        25 Oct 25 11:16 ranked_structure_1.mol2 -> ./gold_soln_structure_4.mol2
    lrwx--x--x    1 gareth   daemon        25 Oct 25 11:16 ranked_structure_2.mol2 -> ./gold_soln_structure_1.mol2
    lrwx--x--x    1 gareth   daemon        25 Oct 25 11:16 ranked_structure_3.mol2 -> ./gold_soln_structure_2.mol2
    lrwx--x--x    1 gareth   daemon        25 Oct 25 11:16 ranked_structure_4.mol2 -> ./gold_soln_structure_3.mol2
    
          

    If the input file format is PDB, GOLD predictions will be written out in the PDB file format. Similarly, if the input file format is MACCS SD, GOLD predictions will be output in that format.


    Fit_Analy

    The script fit_analy reads ga.log and ranks solutions by their fitness scores. The various contributions to the total fitness for each solution are shown, together with a summary.

    For example:

    Gareth [62] [test1cps] [13:34] > fit_analy
    
    Rank Mol   Fitness   VDW int  Tors Int       VDW ext   H int   ext
    
      1    4     62.31      3.31     -3.33         23.56    0.00 32.88
      2    1     62.01      2.44     -3.32         24.19    0.00 32.65
      3    2     60.65      2.80     -3.35         23.28    0.00 32.10
      4    3     56.91      3.29     -6.97         29.10    0.00 24.22
    
    Av fit = 60.47
    VDW int = 2.96 Tors Int = -4.2425 VDW ext = 25.0325
    H Bond int = 0 ext = 30.4625
    

    Thus, solution number 4 is the best GOLD prediction. If structure.mol2 is the name of the original ligand data file, then gold_soln_structure_4.mol2 will contain the predicted binding mode for this GOLD job.

    fit_analy is used by the front end to rank solutions when the output window is open.

    If more than one ligand has been docked into the protein, the command fit_analy.batch should be used:

    $ fit_analy.batch 
    
                 Molecule                Internal                   External
    Rank             Name  No   Fitness       VDW   Torsion  Hbond       VDW  Hbond
    
       1  sam-MFCD0017875   2     51.99      2.55     -7.83   0.00     36.84   6.61
       2  sam-MFCD0017875   3     51.59      1.56     -6.37   0.00     38.64   3.27
       3  sam-MFCD0017875   1     50.09      1.48     -5.97   0.00     38.11   2.17
       4  sam-MFCD0012325   1     49.24      1.37     -7.23   0.00     39.99   0.11
       5  sam-MFCD0012325   3     48.10      2.24     -8.94   0.00     39.41   0.61
       6  sam-MFCD0005477   3     47.98      4.20     -6.46   0.00     36.54   0.00
       7  sam-MFCD0005271   1     47.82      2.62     -9.56   0.00     39.83   0.00
    .......
     534  sam-MFCD0002487   1     -1.32    -28.12    -13.76   0.00     29.48   0.04
     535  sam-MFCD0003774   1    -66.66   -113.29     -5.19   0.00     37.69   0.00
     536  sam-MFCD0003774   2    -67.74   -113.31     -5.16   0.00     36.89   0.00
     537  sam-MFCD0000398   3    -72.83   -105.60     -7.23   0.00     29.09   0.00
     538  sam-MFCD0000398   2    -74.73   -105.50     -7.36   0.00     27.73   0.00
     539  sam-MFCD0000398   1    -79.75   -108.21     -6.92   0.00     25.74   0.00
     540  sam-MFCD0005913   1    -89.92   -128.95     -9.59   0.00     35.36   0.00
     541  sam-MFCD0005913   2    -92.39   -127.69     -9.84   0.00     32.83   0.00
     542  sam-MFCD0005913   3    -96.20   -127.50     -8.87   0.00     29.21   0.00
    
    Av fit = 32.17
    VDW int = -1.42 Tors Int = -5.56 VDW ext = 26.57
    H Bond int =  0.00 ext =  2.62
    

    Grommitt

    grommitt is a simple molecular viewer for examining binding modes. It is used by the front end, when the display variable is set, for showing the current top solution in a GA run.

    It can also be used from the command line to display overlays of SYBYL MOL2 files.

    Usage: grommitt [-chp] files

    The flags are:

    files is a list of SYBYL MOL2 (suffix .mol2) and/or Brookhaven PDB files (suffix .pdb or .ent).

    grommitt is useful for visualising a set of GOLD solutions: it is possible to see at a glance if all GOLD solutions are identical or whether there are several different binding modes.

    For example:

    %grommitt -h gold_soln*
    

    displays the window:

    grommitt

    Click here for full size image


    Smart_RMS

    smart_rms calculates the rms difference between two conformations of the same structure, while taking account of symmetry effects (such as ring flipping).

    Using a graph isomorphism algorithm, an RMS score is calculated for each way of mapping the molecule onto itself.

    Usage: smart_rms [-hv] conformation_1 conformation_2

    The flags are:

    conformation_1 and conformation_2 are MOL2 files containing the two conformations. They should only differ in terms of coordinates.


    RMS_Analysis

    rms_analysis calculates an RMSD matrix for a set of structures (in MOL2 files) and performs hierarchical cluster analysis. A graph isomorphism algorithm is used to determine optimal RMS values.

    Usage: rms_analysis [-c n] file1.mol2 file2.mol2 ....

    where n is 1 for single linkage cluster analysis, 2 for complete linkage and 3 for group average.

    For example:

    $ rms_analysis ranked_ligand_?.mol2
    
    ________________________
    RMSD Matrix of molecules
    
    
          2    3    4    5    6    7    8    9  
    
     1 :  0.8  1.1  1.0  1.0  1.4  2.3  5.0  4.6
     2 :       0.9  1.1  1.1  1.2  2.3  5.2  4.6
     3 :            0.4  0.8  0.9  2.3  5.0  4.5
     4 :                 0.6  1.1  2.3  4.9  4.5
     5 :                      1.3  2.0  4.9  4.5
     6 :                           1.8  5.1  4.4
     7 :                                5.3  4.5
     8 :                                     2.4
    
    Clustering using complete linkage.
    
    Dist  Clusters...
     0.40 |  1 |  2 |  3  4 |  9 |  5 |  6 |  7 |  8 |
     0.84 |  1 |  2 |  3  4  5 |  9 |  8 |  6 |  7 |
     0.84 |  1  2 |  7 |  3  4  5 |  9 |  8 |  6 |
     1.13 |  1  2  3  4  5 |  7 |  6 |  9 |  8 |
     1.42 |  1  2  3  4  5  6 |  7 |  8 |  9 |
     2.35 |  1  2  3  4  5  6  7 |  9 |  8 |
     2.38 |  1  2  3  4  5  6  7 |  9  8 |
     5.28 |  1  2  3  4  5  6  7  9  8 |
    

    Environment Variables

    The following Environment variables are required:

    GOLD_DIR The directory where GOLD is installed.
    GROMMITT_DIR The directory where grommitt is installed.

    The environment variables TK_LIBRARY and TCL_LIBRARY may also be required by Tcl/Tk.

    These variables should be set in the file CSHRC or BASHRC in the installation directory, so that the user need only source this file to run GOLD.


    up Back to main GOLD page

    User Support