Subsections

# 3 Program usage

SADIC is implemented as a command line tool. The basic usage is:

The full list of options is described in the section 4. You can use the -- characters if the last option gets confused by the entity.

entity is the input data. Further details are provided in the section 3.1.

## 3.1 Program input

The program input is always a PDB entity file. This file can be entered:

• by file name for a file in the local file system; es.
RunSadic c:\Dati\Pdb\1AON.pdb


• by URL if the file is in a network location; the protocol can be http, ftp, file; es.
RunSadic ftp://someserver/somepath/1AON.pdb


• by pdbId code: the file will be looked for in a list of databases given in the pdblist.conf file (see further); es.
RunSadic 1AON


• through the stdin; es.
gzip -dc 1AON.pdb.gz | RunSadic


The type of entity is automatically detected from the command line argument.

A PDB entity file can contain a single model or many models in different MODEL...ENDMDL blocks. SADIC treats each model as a different problem, emitting a different output for each one of them; a set of output files containing statistical data are also emitted. The models to perform calculations on are selected through the --models option; by default calculations are performed on all the models.

PDB entities can be read from files compressed in the compress format: in this case they must present a .z or .Z extension. For patent issues, the program is not provided with a compress tool: it uses an external program such as gzip to perform decompression. The program is quite standard in Linux installations; for Windows platform it can be downloaded it from http://www.gzip.org. The tool must reside in a PATH directory.

### 3.1.1 Databases list

The pdblist.conf file tells the program where to look for PDB entities if the user searches for them by pdbId code. pdblist.conf is a text file containing a sequence of URLs with a variable part in the form %(string)s. When a search is to be performed, the variable part is replaced with a real value. Currently string can be:

• pdbid: replaced by the lowercase pdbId
• PDBID: replaced by the uppercase pdbId

The file can also contain any whitespace and comment (marked with a #' character).

The resource is looked for in the provided URLs in the order they appear in the pdblist.conf file: the first URL found is used for calculation. If you want to look for PDB files in a local directory before trying some public database over the Internet, you can specify a file URL in the first position of the pdblist.conf file.

Note: On Windows platforms, the file URL syntax is slightly different from the standard way to express a directory path. For example, you can look for files named pdbabcd.ent.Z in the directory C:\path\to\files (where abcd is the pdbId) using the syntax
file:///C|path/to/files/pdb%(pdbid)s.ent.Z


The pdblist.conf file can reside in any of the following positions:

• the user's home (on Windows is the location in the HOMEPATH environment variable, usually C:\Documents and Settings\username);

• the /etc/ directory (for POSIX systems);

• the program package directory. The package is installed in the standard site-package directory of your Python installation. To know exactly where it is, you can use the command:
python -c "import os, sadic; print os.path.dirname(sadic.__file__)"


The program looks for the file in the given order and uses the first found. A sample (and already useful) pdblist.conf file is already installed in the package directory: you can copy it in your home directory and update the copied version to override defaults without loosing the original values.

## 3.2 Sampling points

By default calculation is performed on every -carbon atoms in the molecule. The option --atom-name tells the program to sample all the molecule atoms with a given name. If you want to limit the selection to a list of residues and/or chains you can use the --residues and --chains options. A range of residues can be expressed in N1-N2 form (both included).

Alternatively the atoms to be sampled can be selected by serial through the --serials option (which can also be in N1-N2 form). Options --residues and --chains are not used in this case.

Note: Each serial in an entity file is unique across models: through the --serials option, atoms from different models can be selected. If the queried models have a different number of selected atoms, totals cannot be computed.

Finally, sampling can be performed around a generic point in the space. The point can be expressed through the --point option using coordinates in the entity space. The option can be used more than once to specify many points.

## 3.3 Sampling pattern

Calculation of the depth index is performed through sampling. The molecule is modelled as an assembly of spheres whose centers are the atoms center and whose radius are the atoms Van der Waals (VdW) radius. Samples are gathered into a spherical volume around the sampling point. Samples are placed over concentric spherical surfaces with regularly growing radii and are about equally spaced on each sphere. The parameters controlling the sampling pattern are:

radius in Å of the sampling area. It is the fundamental parameter r in the depth index definition. It can be changed with the --radius option. If not specified, sampling will be performed along growing radii until no sampled atom is completely inside the molecule. When an entity file presents many models, radius is selected on the first model and applied unchanged on further ones.

length in Å of the sampling steps in radial direction. This parameter influences the calculation precision. In the case of tabular output, a value for each sampling radius is returned. It can be changed through the --step option.

Cross step:
maximum distance in Å between adjacent samples on each concentric sphere. Their number is kept as low as possible without exceed this value. By default this value is homogeneous to the step in the radial direction , but it can be varied independently through the --cstep option.

Note: If the --cstep parameter is not used, halving the radial step through the --step parameter results in about eight time the samples obtained, and consequently calculation time. If such a dramatic increment is not needed, the parameters --step and --cstep can be varied independently:

• halving the --step value while keeping --cstep fixed results in about twice the samples. Furthermore, using a tabular output with --format table, the table will have twice the columns: this may be useful if you are interested, for example, in plotting the depth index per radius;

• halving the --cstep value while keeping --step fixed results in about four time the samples. Precision increases, but the number of columns in a --format table output doesn't change.

## 3.4 Model control

The model representation is built with a conventional VdW radius for each element. Conventional radii are obtained from [3] and can be found in section 4.5. Only radii for the most fundamental elements (C, H, N, O, S, P) are set; if you need to change any of the default value or add some missing element in the list, use the --atom-radii option. If the radius for any atom name in the entity file is missing, computation can't be performed.

By default, sampling is performed only on residues atoms (i.e. the records ATOM in the PDB file). If you want to include HETATM records too, you need to use the --hetatm option and select the atom radii for all the elements through the --hetatm-radii option. An HETATM radius which is not found in the --hetatm-radii list is searched in the --hetatm list (including the default radii) before giving up.

The radius selection, both for ATOMs and HETATMs, is peformed on the base of the name field. The name doesn't have to be perfectly equal to what selected through --atom-radii and --hetatm-radii: if a name is not found, the last character is stripped away and what remains is tested again, until a match is found (or nothing remains). Furthermore a leading digit is stripped from the name. For example atoms CA match the radius set for the element C and, for an atom whose name appears as 1HH1 in the PDB file, the radius set for H is used.

Solvent molecules are excluded from the computation. The default list of resNames values for HETATM to be ignored can be altered through the --solvent option.

Note: Many NMR generated models include dummy atoms whose name begins with Q. If you want to avoid them in calculation, you can set their radius to zero through the --atom-radii option. Es.
RunSadic --atom-radii Q 0 -- 1pit


## 3.5 Program output

SADIC can emit many informations for each model sampled. Each data stream is stored in a different file, and if the entity contains many models, each model generates a different set of data streams.

The data streams to save are chosen through the --data option with the sequence of codes the user is interested in. The meaning of each output symbol is:

di:
the depth index;
hv:
the hidden volume;
ev:
the exposed volume;
hs:
the hidden surface;
es:
the exposed surface.

By default only the depth index is calculated.

The output files names are obtained by mangling the input entity name (if not applicable, out is used) with the output symbol and the model serial. If the input file name was 8cho.pdb, the output file for the depth index will be 8cho_di.ext. If the input file was the entity 1PIT, containing 20 models, the exposed volume files will be named from 1PIT_m01_ev.ext to 1PIT_m20_ev.ext. The value for ext depends on the output format.

If the entity contains more than one model, two cumulative files for each output streams are generated too. In the latter example, their names are 1PIT_avg_ev.ext and 1PIT_std_ev.ext; respectively containing the average and the standard deviation over the models of the data stored in the corresponding files.

A different base name, as well as a different path for storing output files, can be chosen through the --output option.

### 3.5.1 Tabular output

By default, the data streams are stored in a tabular output file. Each atom sampled is stored in a different row; the table has a column for each sampling radius. The file also presents an header row with the sampling radius in Angstroms and an header column with the atom serial (or the point coordinates for --point sampling points. Many details in the output format can be chosen: refer to the section 4.6. The file extension for tabular output is .txt.

### 3.5.2 PDB output

Using the option --format pdb, the output files are stored as PDB entity files, useful for visualization into a molecular display program such as MolMol (http://hugin.ethz.ch/wuthrich/software/molmol/). In this case only data relative to the biggest radius are stored. The file extension for PDB output is .pdb.

Data are stored in the tempFactor field of the ATOM records. Because the field can only range from 0 to 99.99, volume and surface data are normalized in the 0-1 range. Depth index values range in the 0-2 interval, so further rescale is not needed. Atoms not sampled are reported with a 99.99 value.

SADIC provides a MolMol macro displaying atoms with tempFactor` values ranging in 0-2 in a colour blend and out of range atoms in gray. The macro is called di.mac and is located in the package directory.