ANSIG tutorial

This section attempts to describe how one gets started with ANSIG. It contains recipes on how to set up assignment projects. Two example data sets are distributed with ANSIG, one with homonuclear 2D spectra (ras). It is a good idea to go through these in order to understand what ANSIG can do.

It is assumed in what follows that ANSIG has been properly installed at your site, and that the example data sets are accessible.

The ANSIG program requires a number of files to work properly. Some of these files are the same for most assignment projects, and are therefore located in the main ANSIG directory which should be accessible to all users. Among these general files are the HTML documentation files (with the top file being ansig/doc/ansig.html), the dictionary file, and many AL procedure files.

Other files are specific to each assignment project. Among these are the control file, the sequence file, the spectrum description file, the crosspeaks file and the initialization file.

Example setups

In order to run ANSIG sessions using the examples, one must define the correct directories. The top ANSIG directory (usually /usr/local/ansig) is best defined in your .cshrc file in the form of an environment variable:
	% setenv ANSIGDIR /usr/local/ansig
Copy the contents of the pgc2 or ras example directories to your own area. You must have read/write privileges to some of the files in these directories.

The pgc2-spectra and ras-spectra directories need not be copied; the contents of these are read-only.

Setup recipe

Here is a short recipe for how to set up your own project, starting from the example files.

Running ANSIG

Here is a description of how the user interface in ANSIG works.

File formats, formatted files

The various text (ASCII) files used as input for ANSIG all follow the same basic rules as regards file format:

The contents must follow a well-defined syntax, where the basic idea is that there are keywords (reserved words) followed by one or possibly more values.

The particular syntax for each kind of file defines which items must be present and which are optional. The syntax allows any number of blanks, tabs or newlines between the items, as long as the items are separate. The items must appear in the specified order.

Comments are allowed almost anywhere, and are treated as blanks. A comment is started by an exclamation mark '!' and ends at the end of the line. A comment can contain any characters.

The only text files not following these format rules are command input files, most importantly the initialization file. These are files containing commands that are handled exactly in the same way as if they had been typed in by hand at the ANSIG> prompt. The main difference compared to ordinary text files is that one entire command must be given on one single line. Currently, one input line may contain at most 80 characters. It is not possible to split up a command on more than one line.

File formats, binary files

ANSIG uses only one kind of binary (unformatted) file produced by another program, namely the spectrum matrix file. The format for this file is the same as that used by the Azara processing package, written by Wayne Boucher. There are some programs and procedures to convert processed matrix files from other formats into ANSIG format.

The spectrum matrix format is not described explicitly here. If necessary, it is fairly straightforward to sort out from the source code for the conversion programs in the distribution.

The other binary files used in ANSIG, notably the crosspeak file and the contour files, are not documented, and may in fact change in new releases of ANSIG. These files are written and read by ANSIG, and are not meant to act as interfaces to other programs. There is a formatted export file for the crosspeaks file, which is the official interface to other programs.

There is an exception for the contour file: Such files can be created by the Azara package, and are directly readable by ANSIG.

File naming conventions

Although ANSIG does not require specific names for the various files, in practice it is a good idea to use a well-defined convention. The naming convention used in the example files is that the file name extension (the characters after the last period '.') gives a hint of what kind of file it is.

Control file

The control file contains the names of files that ANSIG accesses during a session. This is a file specific for each project, and possibly also for different setups within the same project.

Directory symbols

The control file usually contains entries which define abbreviations for certain directories, so called directory symbols. These are denoted by curly braces {} and can be used everywhere in ANSIG where a file name is expected. For instance, if a directory symbol entry in the control file is:
	{spectra}  /usr/people/krpx/prot/spectra/
then it is possible to use {spectra}noesy.spc as a shorter file name for /usr/people/krpx/prot/spectra/noesy.spc. Note that ANSIG does not add the slash character ('/') used to separate directory names; this character must be explicit.

The AL procedures in the distribution assume that the directory which contains the AL procedures is determined by the {lib} directory symbol. This is not really hard-wired into ANSIG, but is a convention adopted when the library AL procedures were written.

The name between the curly braces in a directory symbol can also be an environment variable (in the UNIX sense) defined in the .cshrc file using the setenv command.

Sequence file

The sequence file specifies the sequence of the protein. The residue names (=numbers) do not have to be integers, but can be basically any sensible combination of at most four characters. In ANSIG, a "sequence residue" or a "sequence" most often refers to a residue in the protein sequence. The term "residue" is usually restricted to the residue types (Ala, Trp, Ser,...) given in the dictionary file. A sequence residue must be of a residue type present in the dictionary if its nuclei are to be assignable.

The sequence file can be used as a note-pad for comments on the state of the assignment. One enters one sequence residue per line, and uses the rest of the line as space for comments after a comment character (exclamation mark, '!'). The ANSIG program ignores such comments when the file is read on program startup.

It is usually not a good idea to change the residue entries in the sequence file (except for the comments, of course) during a project, unless one knows exactly what one is doing. One may introduce inconsistencies or outright errors into the assignment list if such changes are made.

Spectrum description file

The spectrum description file (or spd file, for short) defines the spectra used in a project, defines the names for them, defines the dimensionality and expnuclei in each dimension, describes the experiment performed, and gives some additional information for each spectrum. The spectrum matrix (if present) for each spectrum is also defined in the spd file.

The name given for a spectrum in the spd file cannot be changed once crosspeaks in that spectrum have been created. The crosspeak file contains information on which spectrum each crosspeak belongs to, and so ANSIG will not be able to use a crosspeak file which refers to spectra not present in the spd file. There are ways around this, but these are tricky to use. Therefore, give the spectra good names from the start.

Spectrum matrix files

The spectrum matrix files are the primary data for ANSIG. The format used for the matrix file in ANSIG is that of the Azara processing package, written by Wayne Boucher. The matrix file really consist of two files: the matrix itself in a file containing only data, and a parameter file describing the layout of the matrix and containing referencing data. The parameter file is a formatted, readable file.

There is another kind of spectrum matrix file, the deflated matrix file (usually called *.def), which is a matrix file with the data points below the noise removed. Depending on the sparseness of the spectrum matrix, and the noise level defined by the user, deflating a matrix file may save large amounts of disk space. The Azara package contains a program to do this. The data points below the noise level cutoff are irretrievably lost. ANSIG can read deflated matrix files.

There are some conversion programs included in the release, which can convert certain other matrix formats to ANSIG/Azara format.

It may be useful for a programmer to look at or use the source code for the procedures used by all these conversion programs to access ANSIG files. These procedures are found in the matrix.f file. This is useful if other conversion programs must be written.

Crosspeaks file

The crosspeaks file (usually called *.crp or *.cp) is created, written, read and updated by the ANSIG program. It is in binary format, and cannot be inspected directly by the user. The user should not try to edit the crosspeak file in any way outside of ANSIG; all changes to the crosspeaks must be done through the ANSIG program.

The crosspeaks file is updated after every change made by the user to the assignments while running ANSIG. Therefore, it should in principle always contain a proper image of the state of the assignment. If the computer goes down, or some other problem causes ANSIG to crash abnormally, the crosspeak file should in principle be OK, and contain all changes that have been made. In practice, the computer operating system usually buffers the input/output operations, so that the crosspeak file may not be completely up-to-date after a crash. Such a corrupted crosspeak file may not be readable by ANSIG. In order to avoid massive loss of assignment data, there is a facility in ANSIG which writes out a complete backup copy of the crosspeaks file every so often.

Initialization file

In principle, it should be possible to run ANSIG without any special setup beyond giving the necessary data files. However, it is almost always useful to configure various aspects of ANSIG to fit the project at hand.

For instance, a specified number of graphics windows may be needed in specified areas of the screen. Some useful macros may need to be defined. Some AL procedures may need to be compiled for dealing with certain spectra in an efficient manner. All these setup operations, and others, can be done in the initialization file. Also, if one wishes to have several different setups for a project, for instance one setup for assignment of J-correlated spectra, and another for assignment of NOESY spectra, then this is done by having different initialization files.


Per Kraulis 10 Apr 1996.