GOLD: Using Torsional Distributions

Torsional distributions extracted from the Cambridge Structural Database (CSD) can be input to GOLD. These distributions can then be used to restrict conformational searches or, in principle, direct chromosome initialisation and GA mutation. Currently, only the first of these is implemented.

  • The torsional distribution file
  • Header Information
  • Specifying Torsional Distributions
  • Examples of Torsional Distributions
  • Matching Torsional Distributions
  • Extracting the Torsional Distribution from a CSD Search
  • Further Improvements

  • The Torsional Distribution File

    This file consists of two sections:

    1. Header information
    2. A list of torsional distributions

    Here is the default torsional distribution file. It is called gold.tordist and is located in the GOLD distribution directory.

    Two other torsional distribution files are available: gold.tordist.new contains all the torsions in gold.tordist and many more new distributions. However, many of these newer torsions have very few hits in the CSD and no significant improvement was found when using this new file in GOLD. A second file, mimumba.tordist, contains all the torsional distributions used in the MIMUMBA program (Klebe and Mietzner, JCAMD, 8 (1994) 583-606). Many thanks to Gerhard Klebe for supplying a copy of these torsional histograms, which are also deposited as supplementary material with the Journal of Computer-Aided Molecular Design. Both these files are located in the GOLD distribution directory.

    To customise torsion-distribution information, copy the default torsional distribution file and edit it. GOLD can then be run using the edited file. GOLD gets the location of the torsional distribution file from the configuration file line "tordist_file = <torsional distribution file location>". This is most easily defined using the Tordist File entry box in the front end.

    The format of entries in the torsional distribution file is quite strict: incorrect editing of the file may cause GOLD to behave in unexpected ways or even to crash.

    Header Information

    The first section of the torsional distribution file sets parameters and tells GOLD what to do with the distributions.

    N_BINS is the number of bins used in the torsional histogram

    If torsional distributions are used, GOLD will no longer sample over 360 degrees but will constrain the torsion to values contained in the histogram. However, if a histogram contains a large number of entries, then there may be some high-energy torsions within the histogram. GOLD therefore provides a method for filtering out such high-energy torsions: set REMOVE_HIGH_ENERGY = 1 and DELTA_E to E to remove those bars in the histogram that correspond to torsions that are E kcal/mol higher in energy than the most populated state. The ground state of the torsion is assumed to correspond to the maximum peak in the torsional histogram. The energy difference between this ground state and any other peak in the torsional histogram is then assumed to be approximately given by the partition function.

    Let be the number of counts in the ground state, let N be the total number of counts in the histogram and let be the number of counts in another bar in the histogram. Assume (!!) that these counts correspond to the expected populations of the torsions in a free system. From the partition theory:

    and , , q is the molecular partition function and N is the total number of counts [Atkins, Physical Chemistry Chap 21]



    In Joules/mol, , RT = 8.314 x 298 = 2477.572 J/mol

    In cal/mol, RT = = 589.9

    Thus, in Kcal/mol, RT = .5898

    so = .5898

    Thus, from the size of bars in the torsional histogram we can predict (!!) the energy difference between torsional states.

    Let Ratio = ; then
    Ratio =

    i.e
    Ratio
    3.0 161
    2.5 69
    2.0 30

    Thus (for example), if REMOVE_HIGH_ENERGY=1 and DELTA_E = 2.5, those bars which are 1/69th or less of the height of the largest bar will be removed from the histogram.

    Specifying Torsional Distributions

    Each torsional distribution entry comprises three lines:

    1. The first line is the name of the torsion angle.
    2. The second line is the definition of the torsion angle.
    3. The third line is the histogram. It is a list of space-separated integers. There should be N_BINS integers. The first bin starts at -180 degrees and the last bin ends at 180.

    Torsional distributions are defined using the following grammar (the grammar is expressed in BNF and alternatives are indicated by ||, other characters |, (, ), [, ] are part of the grammar):

    TORSION = NODE | NODE | NODE | NODE | ||
    NODE | NODE | NODE | NODE | DIRECTIVE
    NODE | NODE | NODE | NODE | DIRECTIVE | DIRECTIVE
    DIRECTIVE = expand <min> <max> || period <min> <max>
    NODE = ATOM || ATOM (NEIGHBOURS)
    NEIGHBOURS = NEIGHBOUR_NODE || NEIGHBOUR_NODE NEIGHBOURS
    NEIGHBOUR_NODE = NODE || HYDROGENS
    HYDROGENS = 0H || 1H || 2H || 3H
    ATOM = ATOM_DEF || ATOM_DEF [FRAGMENT]
    FRAGMENT = ribose || adenine || uracil || benzene
    ATOM_DEF = TYPE_DEF || LINKAGE<no space>TYPE_DEF
    TYPE_DEF = SYB_TYPE || EL_TYPE
    LINKAGE = ~ || = || -
    SYB_TYPE = C.3 || C.2 || C.1 || C.ar || C.cat || N.3 || N.2 || N.1 || N.ar || N.am || N.pl3 || N.4 || O.3 || O.2 || O.co2 || S.3 || S.2 || S.o || S.o2 || P.3 || H || F || Cl || Br || I
    EL_TYPE = C || N || O || S || P

    This grammar allows torsions to be specified as four fragment nodes. Each node defines an atom type and, optionally, a set of neighbours to which the atom is connected. Each of the neighbours is a node or an exact count of the number of hydrogens to which the atom is bonded. Atom types are defined using SYBYL atom types or elemental atom types. The atom can also be required to be part of a pre-defined fragment. Bonding environments can also be specified, using the symbols ~,=,-, which indicate, respectively, that an atom forms an aromatic, double or single bond to its parent node. A node is a parent of all its neighbours and a top level node in the torsion definition is a parent of subsequent nodes in the torsion.

    There are currently four fragments available, one of which (the uracil fragment) matches both thymine and uracil. More fragments can easily be added. The Ullman algorithm is used to determine if an atom belongs to a fragment. Fragments are defined through SYBYL atom types and connectivity (exact bond types are not used). Only heavy atoms are considered. Currently, fragments are precompiled, but they could be read in at run-time if required.

    Directives are allowed to take account of special circumstances. There are two directives: expand or period. The expand directive has the form "expand <min> <max>" where <max> - <min> = 180.0. or <min> = 0. This directive is used for torsions where the CSD query has symmetry and torsions are only measured over <min> to <max> degrees. However, although the CSD query may have two-fold symmetry, often the matched structure does not. The expand directive fills out the rest of the histogram with the correct values. The period directive takes account of those torsional distributions for with the matched stucture has symmetry. This directive has the form "period <pmin> <pmax>". The distribution will only be expanded between angles <pmin> and <pmax>.

    Examples of Torsional Distributions

    acid T1
    C.2 (O.co2 O.co2) | C.3 (2H) | C.3 (2H) | C
      41   8   0   0   0   0   0   0   0   1   8   7   2   0   0   0   0   1   1   0   0   0   1   0   4   1   0   1   0   0   0   0   0   2   2  41
          
    acid T2
    O.co2 | C.2 (O.co2) | C.3 (2H) | C.3 (2H C)
       8   5   1   3   2   1   3   2   3   2   3   3   4   0   3   2   7  11  15   9   1   4   1   0   2   1   4   4   1   3   3   6   0   3   5   7
          
    amide nh T2  
    C.2 (=O.2 N.am (1H)) | C.3 (1H C.3) | N.am (1H) | C.2 (=O.2)
       1   1  14  16  29  25  23  38  35  50  82 156  53   6   1   0   0   0   0   0   0   1   1  14  17  15   4   4   2   1   2   5   2   2   0   0
          
    uracil
    O.3 [ribose] | C.3 [ribose] | N.am [uracil] (C.2 (1H))| C.2 [uracil] (=O.2)
      24  73  85  44  59  60  40  14   8   3   2   0   0   0   0   0   0   0   0   0   0   0   0   0   7   5   3   0   0   1   4   3   3   5  10   6
          
    benzyl sub  
    C | C.3 (2H) | C.ar (~C.ar (0H)) | ~C.ar (0H) | expand 0.0 180.0
       0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   9  27  76  64  15   7   4   2   0   0   0   0
          

    Matching Torsional Distributions

    Once all rotatable bonds have been found in the ligand, they are checked against all torsional distributions using a depth-first search. Each atom in the torsional distribution matches only one atom in the ligand. In some cases, a rotatable bond may match more than one torsional distribution. If this happens, a score is calculated for each torsional distribution and the distribution with the highest score selected. Each portion of the torsional distribution contributes to the score as follows:

    Element atom type 1.5
    SYBYL atom type 2.0
    Fragment 3.0
    Hydrogen count 2.0
    Bond linkage 0.5

    Extracting the Torsional Distribution from a CSD Search

    The command process_tab will extract the torsional histogram from the tab file produced by a CSD search and reformat it so that it can be added into the GOLD torsional distribution file.

    Further Improvements

    The major problem in the current version of the torsional distribution file is the lack of appropriate distributions: only 15% of ligand torsions in the GOLD test set are matched. There will undoubtedly be a need for additional fragments and improvements to the specification.

    Dependencies between two adjacent torsions may also need to be added, as in the Ramachandran plot.

    Probability densities for mutation and initialisation also need to be derived (this is more difficult than it seems!).


    up Back to the main GOLD page

    User Support