SADIC is implemented as a command line tool. The basic usage is:
RunSadic [options [...]] [--] [entity]
The full list of options is described in the section 4. You can use the -- characters if the last option gets confused by the entity.
entity is the input data. Further details are provided in the section 3.1.
The program input is always a PDB entity file. This file can be entered:
RunSadic c:\Dati\Pdb\1AON.pdb
http
, ftp
, file
; es.
RunSadic ftp://someserver/somepath/1AON.pdb
pdbId
code: the file will be looked for in a list of databases
given in the pdblist.conf file (see further); es.
RunSadic 1AON
gzip -dc 1AON.pdb.gz | RunSadic
The type of entity is automatically detected from the command line argument.
A PDB entity file can contain a single model or many models in different
MODEL
...ENDMDL
blocks. SADIC treats each model as a
different problem, emitting a different output for each one of them; a set of
output files containing statistical data are also emitted. The models to
perform calculations on are selected through the --models option; by
default calculations are performed on all the models.
PDB entities can be read from files compressed in the compress format: in this case they must present a .z or .Z extension. For patent issues, the program is not provided with a compress tool: it uses an external program such as gzip to perform decompression. The program is quite standard in Linux installations; for Windows platform it can be downloaded it from http://www.gzip.org. The tool must reside in a PATH directory.
The pdblist.conf file tells the program where to look for PDB entities if the
user searches for them by pdbId
code. pdblist.conf is a text file
containing a sequence of URLs with a variable part in the form
%(string)s
. When a search is to be performed, the variable part
is replaced with a real value. Currently string can be:
pdbid
: replaced by the lowercase pdbId
PDBID
: replaced by the uppercase pdbId
The file can also contain any whitespace and comment (marked with a `#
'
character).
The resource is looked for in the provided URLs in the order they appear in the
pdblist.conf file: the first URL found is used for calculation. If you want to
look for PDB files in a local directory before trying some public database over
the Internet, you can specify a file
URL in the first position of the
pdblist.conf file.
file
URL syntax is slightly different from
the standard way to express a directory path. For example, you can look for
files named pdbabcd.ent.Z in the directory
C:\path\to\files (where
abcd is the pdbId
) using the syntax
file:///C|path/to/files/pdb%(pdbid)s.ent.Z
The pdblist.conf file can reside in any of the following positions:
python -c "import os, sadic; print os.path.dirname(sadic.__file__)"
The program looks for the file in the given order and uses the first found. A sample (and already useful) pdblist.conf file is already installed in the package directory: you can copy it in your home directory and update the copied version to override defaults without loosing the original values.
By default calculation is performed on every -carbon atoms in the molecule. The
option --atom-name tells the program to sample all the molecule atoms with a
given name
. If you want to limit the selection to a list of residues
and/or chains you can use the --residues and --chains options. A range
of residues can be expressed in N1-N2 form (both included).
Alternatively the atoms to be sampled can be selected by serial through the --serials option (which can also be in N1-N2 form). Options --residues and --chains are not used in this case.
serial
in an entity file is unique across models: through the
--serials option, atoms from different models can be selected. If the
queried models have a different number of selected atoms, totals cannot be
computed.
You can also run SADIC on all the ATOM
records in the PDB files using
the --all-atoms option.
Finally, sampling can be performed around a generic point in the space. The point can be expressed through the --point option using coordinates in the entity space. The option can be used more than once to specify many points.
Calculation of the depth index is performed through sampling. The molecule is modelled as an assembly of spheres whose centers are the atoms center and whose radius are the atoms Van der Waals (VdW) radius. Samples are gathered into a spherical volume around the sampling point. Samples are placed over concentric spherical surfaces with regularly growing radii and are about equally spaced on each sphere. The parameters controlling the sampling pattern are:
table
, the table will have twice the columns:
this may be useful if you are interested, for example, in plotting the
depth index per radius;
table
output doesn't change.
The model representation is built with a conventional VdW radius for each element. Conventional radii are obtained from [4] and can be found in section 4.5. Only radii for the most fundamental elements (C, H, N, O, S, P) are set; if you need to change any of the default value or add some missing element in the list, use the --atom-radii option. If the radius for any atom name in the entity file is missing, computation can't be performed.
By default, sampling is performed only on residues atoms (i.e. the records
ATOM
in the PDB file). If you want to include HETATM
records too,
you need to use the --hetatm option and select the atom radii for all the
elements through the --hetatm-radii option. An HETATM
radius which is
not found in the --hetatm-radii list is searched in the --hetatm list
(including the default radii) before giving up.
The radius selection, both for ATOM
s and HETATM
s, is peformed on
the base of the name
field. The name doesn't have to be perfectly equal
to what selected through --atom-radii and --hetatm-radii: if a name is
not found, the last character is stripped away and what remains is tested
again, until a match is found (or nothing remains). Furthermore a leading digit
is stripped from the name
. For example atoms CA
match the radius
set for the element C
and, for an atom whose name appears as 1HH1
in the PDB file, the radius set for H
is used.
Solvent molecules are excluded from the computation. The default list of
resName
s values for HETATM
to be ignored can be altered through
the --solvent option.
Q
.
If you want to avoid them in calculation, you can set their radius to zero
through the --atom-radii option. Es.
RunSadic --atom-radii Q 0 -- 1pit
SADIC can emit many informations for each model sampled. Each data stream is stored in a different file, and if the entity contains many models, each model generates a different set of data streams.
The data streams to save are chosen through the --data option with the sequence of codes the user is interested in. The meaning of each output symbol is:
hv
:ev
:hs
:es
:di
:adi
:core
:By default only the depth index is calculated.
The output files names are obtained by mangling the input entity name (if not applicable, out is used) with the output symbol and the model serial. If the input file name was 8cho.pdb, the output file for the depth index will be 8cho_di.ext. If the input file was the entity 1PIT, containing 20 models, the exposed volume files will be named from 1PIT_m01_ev.ext to 1PIT_m20_ev.ext. The value for ext depends on the output format.
If the entity contains more than one model, two cumulative files for each output streams are generated too. In the latter example, their names are 1PIT_avg_ev.ext and 1PIT_std_ev.ext; respectively containing the average and the standard deviation over the models of the data stored in the corresponding files.
A different base name, as well as a different path for storing output files, can be chosen through the --output option.
By default, the data streams are stored in a tabular output file. Each atom sampled is stored in a different row; the table has a column for each sampling radius. The file also presents an header row with the sampling radius in Angstroms and an header column with the atom serial (or the point coordinates for --point sampling points. Many details in the output format can be chosen: refer to the section 4.6. The file extension for tabular output is .txt.
Using the option --format pdb
, the output files are stored as PDB
entity files, useful for visualization into a molecular display program such as
MolMol (http://hugin.ethz.ch/wuthrich/software/molmol/). In
this case only data relative to the biggest radius are stored. The file
extension for PDB output is .pdb.
Data are stored in the tempFactor
field of the ATOM records. Because the
field can only range from 0 to 99.99, volume and surface data are normalized in
the 0-1 range. Depth index values range in the 0-2 interval, so further
rescale is not needed. Atoms not sampled are reported with a 99.99 value.
SADIC provides a MolMol macro displaying atoms with
tempFactor
values ranging in 0-2 in a colour blend and out of range
atoms in gray. The macro is called di.mac and is located in the package
directory.