As an alternative to using SYBYL MOL2 files, both the protein and ligand can be entered in the Brookhaven (PDB) file format. However, the format needs to be extended in order to specify bond orders for the ligand (see below).
File formats can also be mixed: for example the protein can be defined in the PDB file format and the ligand could be defined in a MOL2 file. GOLD identifies the file format by the suffix of the input file: ".pdb" for Brookhaven PBD files and ".mol2" for SYBYL MOL2 files.
Docked ligands are written out in the same file format as the input ligand file. For example, if the input ligand file is ligand.pdb, GOLD solutions will be written out in pdb file format as ga_soln_ligand_1.pdb, ga_soln_ligand_2.pdb, etc.
Solutions in PDB format can be viewed using the grommitt utility.
Specifying the Protein as a PDB file
If the protein is specified in PDB format, all hydrogens must be added (the precise geometrical positions of serine and threonine hydroxyl hydrogens do not matter, as they will be optimised during the GOLD run). Residues should be in sequence order, and correctly named. All atoms should be properly labelled (CA, CB etc). Any unusual bonds (disulphide bridges, etc.) should have CONECT records.
GOLD connects atoms within residues on the basis of proximity. Double bonds are assigned as appropriate for the naturally occurring protein residues, and the protonation states of Asp, Glu and His residues are deduced from the input hydrogen-atom coordinates (i.e. the user specifies the protonation state of these residues by adding or removing the appropriate hydrogens). Atom types are then assigned using the type checker.
After a GOLD run, solvent accessible binding-site atoms are written out in the file active_atoms.pdb
Specifying the Ligand as a PDB file
All hydrogens must be added to the ligand, including those required to resolve any ambiguities about ionisation or tautomeric states. All bonds should have CONECT records. Additionally, double and triple bonds must be specified, as described immediately below.
Specifing the ligand in PDB file format is more complicated than specifying the protein, since bond orders need to be input (this is unnecessary for the protein as GOLD knows the structures of the common amino acids). GOLD uses the same convention as RASMOL: i.e a bond specified twice in a single CONECT record is assumed to be a double bond, and a bond specified three times in a single CONECT record is a triple bond. For example, the following CONECT records specify a double bond between atoms with serial numbers 25 and 26:
CONECT 25 20 26 30 26 CONECT 26 25 27 52 25
This mechanism for specifying bond orders is forced by the lack of a bond-order field in the standard PDB format, and seems to offer lots of scope for users to commit errors. For that reason, we recommend that the PDB format is not used for ligands - we may actually disable it if it causes too many problems.
Bond orders are required by GOLD in order to elucidate atom types and thus identify H-bond donors and acceptors properly. The page on general atom typing explains the bonding patterns that are required to identify particular hydrogen bonding groups and atom types. Once all the bond orders are read in, atom types are assigned using the type checker. The special SYBYL bond types "am" and "ar" are also assigned where appropriate.
Atom Numbering
When you have to specify atom numbers in the input files (e.g. to define covalent bonding or constraints), use the PDB sequence number. Unfortunately, the atom numbering used by GOLD may not correspond to the sequence number if atoms are missing from the PDB file. For example, if GOLD exits with an error message about atom number 1010, then it means the 1010th atom in the PDB input file, which may not have sequence number 1010.