These are intended to be brief notes supplementing and outlining the course material presented in the course Introduction to Daylight. The Daylight manuals should be considered the text for the course and the authoritative documentation, and should be used in conjunction with these notes for best results!
The Daylight Installation and Administration Guide is the relevant manual for this unit.
Once installed the Daylight system consists of a set of
directories and files with topmost directory
v461 and a database directory called
thordb. The files and directories
should all be owned by the Daylight administrator,
usually "thor". There also should be a
directory for keeping configuration files through
version upgrade. Possibilites are
The following should be defined for all Daylight users:
$DY_ROOT - software distribution directory
$DY_LICENSEDATA - license file
And for the administrator:
$DY_THORDB - database directory
Other environment variables may be defined as a means to setting Daylight options. The following, in particular, may be useful for administrators:
$DY_DATABASE_PASSWORDS_FILE - server security file
$DY_THOR_LOG_FILE - Thorserver log
$DY_MERLIN_LOG_FILE - Merlinserver log
$DY_MERLIN_SERVER_LIST - Hosts with Merlinservers
$DY_MERLIN_MEMORY_LIMIT - Max process size
This file must be edited to list the Daylight TCP/IP services. Add the following lines:
Daylight web tools require a webserver to be running on the machine equipped with Daylight software, and configured with the following aliases:
/dayhtml/ -> $DY_ROOT/dayhtml/
/dayicon/ -> $DY_ROOT/dayhtml/icons/
/daycgi/ -> $DY_ROOT/daycgi/ (script alias)
A license supplied by Daylight must be installed at
$DY_LICENSEDATA. The cpu must be identified
(1) output of "testlicense -i" (a Daylight command)
or (2) output of "uname -a" and "hostid" (Sun)
or (3) output of "uname -a" AND "lmhostid" OR "printf "%x\n" `sysinfo -s`" (SGI)
Security is control over access to data and other resources. The Daylight system is concerned with:
Server access is controlled by a system of users and passwords. This user list is completely separate from the list of unix users. However, some Daylight client programs (e.g., xvmerlin) may use the unix username as a "guess" to attempt server access. The list of Daylight users and passwords is contained in the file dy_passwords.dat, located in $DY_ROOT/etc by default or specified by environment variable DY_DATABASE_PASSWORDS_FILE. Although this file may be easily edited by hand, it is designed to modified by inputs to the sthorman program.
This file specifies allowed hosts and users, and passwords. It is normally not edited but accessed only via the Thorserver. However, it is a simple text file.
Example dy_passwords.dat file:
host:*only* host:gator host:corona user:norah:1aA3h3azZaqw23DS user:jj: user:june:GjkO96REnK2G1Waw user:mug:AsDF12REIO9PlLYb user:thor: user:thorinfo:
Server security may be in one of these two modes. In equivalent hosts mode, if a host is listed, any user from that host may connect with no server password. Allowed users may connect from any host. In restricted hosts mode, only allowed users, from allowed hosts, may connect.
In addition, a third security level is "no security", resulting from specifying no passwords file:
thorserver -DATABASE_PASSWORDS_FILE ""
The software is pre-configured with these allowed users. In
thor is hard-coded to have special
administrative privileges, and
hard-coded to have read-only privileges.
The Thorserver, Merlinserver and DayToolserver each are licensed to allow a specific number of simultaneous client connections.
The Thorserver and Merlinserver are two separate executable programs
which work in tandem to provide access to databases. A client may
be a Thor-client
xvthor), or a Merlin-client
mcl), or both (
They share the same passwords file
$DY_DATABASE_PASSWORDS_FILE), and database path
The Remote Toolkit is comprised of the
and one or more Mac or Windows client programs.
daytoolserver may use the same passwords
file as the database servers or a different one
program-objects are needed, a allowed directory for program-object
executables must be specified (option
The Daylight administrator must be able to specify allowed users and their passwords, allowed hosts, and other configuration parameters.
Daylight applications have a unified options manager whereby options and allowed values are defined, defaults specified, and non-default values can be specified in several ways according to defined precedence.
$HOME/dy_profile.opt(but this can be modified by environment variable
$DY_ROOT/etc/unix/dy_sysprofile.opt(but this can be modified by environment variable
DY_SYSPROFILE). The user profile supercedes the system profile.
$DY_ROOT/etc/directory. These files are not to be modified by the user. However, they may be useful as option definition references.
Daylight options are defined in directories
$DY_ROOT/etc/common. The system as shipped has
options specified by .dat files in these directories, all of which are
Applications look for
$DY_ROOT/etc/unix/dy_options.dat, unless variable
DY_OPTIONS is set otherwise.
dy_options.dat specifies, among others,
dy_basic_opts.dat, which specifies options
SYSPROFILE is set to
$DY_ROOT/etc/unix/dy_sysprofile.opt, which exists,
PROFILE is set to
$HOME/dy_profile.opt, which may not exist in
$HOME, though a sample is provided in
Daylight administrators should modify
dy_sysprofile.opt to make changes applicable to
everyone at a site, and users should create and modify their own
$HOME/dy_profile.opt to customize options for
themselves alone. The environment variable
DY_PROFILE may be redefined to, say,
"$HOME/.dy_profile.opt" if desired.
It should be noted that the environment variables
DY_THORDB are not
DY_ROOT must be set for all
DY_THORDB is normally set for the
DY_LICENSEDATA should also
normally be set for all users.
Database installation means copying database files from CDROM or via FTP to a local disk for access by the database servers. Sufficient disk space must be available, and for Merlin access, sufficient RAM.
Daylight databases are configurable in several ways. Configuration
options are stored as fields in the header file for the database
(the .THOR file). The header file can be edited directly, as it
is a simple text file. However, the approved and more reliable
method is to use a Thor-manager client (sthorman, thorchange, etc.).
Auxilliary Databases - All databases have at least one
auxilliary database, a datatypes database. Other possible
auxilliary databases are indirect and monomer. The auxilliary
databases can be set at database creation, and can be modified
Database Passwords - Databases each have three passwords:
Read-only - Databases can be set to read-only.
Caching - Databases can be configured for caching in several ways to improve their performance (speed). Caching forces the Thorserver to hold some or all of a database in memory for fast access, and supplements the operating system's normal file caching. Caching is using RAM instead of disk to improve speed and efficiency. Daylight caching may be specified by the configuration of a database, or initiated by client request if allowed by configuration.
Caching configuration specifications are normally made by thormake, thorchange, or sthorman.
The option CACHE_LEVEL is ignored unless CACHE_WHEN is ALWAYS.
-CACHE_WHEN NEVER Disable caching; all data remain in the disk files. Ignore caching requests from Thor clients. -CACHE_WHEN OK Cache if, when, and as specified by a Thor client. (Synonymous with "ON_REQUEST".) -CACHE_WHEN ALWAYS -CACHE_LEVEL WRITETHRU Read hashtable from RAM, write to disk (and RAM). -CACHE_LEVEL READWRITE Read and write hashtable in RAM. Disk synced when necessary. -CACHE_LEVEL WRITETHRU_ALL Read entire database from RAM, write to disk (and RAM) -CACHE_LEVEL READWRITE_ALL Read and write entire database in RAM. Disk synced when necessary.
By default, both primary data and cross-references are cached. However, we can select one or the other only. Note that both primary and xref data have separate hash tables and datafiles.
-CACHE_WHAT XREFS -CACHE_WHAT DATA
Possible values in the .THOR header file:
cache level: cache level: NEVER cache level: ON_REQUEST cache level: ALWAYS WRITETHRU cache level: ALWAYS WRITETHRU XREFS_ONLY cache level: ALWAYS WRITETHRU DATA_ONLY cache level: ALWAYS READWRITE cache level: ALWAYS READWRITE XREFS_ONLY cache level: ALWAYS READWRITE DATA_ONLY cache level: ALWAYS WRITETHRU_ALL cache level: ALWAYS WRITETHRU_ALL XREFS_ONLY cache level: ALWAYS WRITETHRU_ALL DATA_ONLY cache level: ALWAYS READWRITE_ALL cache level: ALWAYS READWRITE_ALL XREFS_ONLY cache level: ALWAYS READWRITE_ALL DATA_ONLY
Another choice to be made is whether to cache the "regular" database, or perhaps the indirect database. For databases where there are a large number of indirect references per TDT, caching the indirect database can provide the greatest performance gains.
Database holding -
"Holding" a database keeps it open when no clients are using it. This improves performance of frequently-opened and cached databases. Holding is independent from caching level.
Record locking -
With record-locking enabled, clients can lock individual records for exclusive access, "commit" and "rollback" changes, and unlock records.
Header (.THOR) file entry:
tdt locking: TRUE (Either present or not.)
Read-only databases -
Databases can be configured writable (by default) or readonly. Readonly databases do not create lockfiles, thus they are useable from CDROM. Also it is possible to open one database with two separate Thorservers. Header file entry:
read only: TRUE (Either present or not.)
Sufficient physical memory (RAM) is essential to the operation of Merlin. Unlike some applications which can perform acceptably with virtual memory or swapping, the Merlinserver is optimized to achieve high speed searching in RAM, and will slow prohibitively when swapping occurs. Therefore, the administrator should configure the Daylight system to avoid overuse of memory, taking into account:
It is possible to estimate memory usage a priori. However, a simpler way is to obtain pool-size information from the Merlinserver log file. Pool size data for commercial databases is available from Daylight.
Given a set of data, there may be several ways to design the datatypes and database for convenience, compactness, and searchability.
Indirect databases - frequently occuring data may best be represented indirectly. By using an indirect auxilliary database, a common datum my be stored only once, and compact indirect references stored in its place.
Monomer databases - are auxilliary databases which define a table of monomers, or molecular building blocks, which may be referenced by monomer symbols in a Chortles. Combinatorial mixtures can thus be specified by a single Chortles which denote a mixture of 1000's of individual compounds. Mixture databases with auxilliary monomer databases normally are still rooted by SMILES, where the SMILES represents the mixture by replacing variable positions with the wildcard ("*").
Reaction databases - reactions are stored as SMILES as are single molecules. Reaction databases are not auxilliary databases. It is also possible to store reactions and molecules together in the same database.
Datatypes - Given the database type, database design
consists mostly of datatype design. Use the examples in
$DY_ROOT/data/datatypes/ for common datatypes
and as templates for new datatypes.
For each new datatype invented, decide whether it will be an identifier. Identifiers require additional disk space, but offer Thor-lookup capability, and logically identify a distinct chemical entity (isomer, sample, vial, registration number) to which data can be associated. Each identifier tag begins with "$".
Databases are created using
Creating a database means creating the empty files which are in the
format required by the Daylight database servers. These files must
then be loaded with data to be useful. When creating a database,
the following must be specified: (1) a datatypes database, (2)
primary hash table size, (3) cross-reference hash table size. Also,
the path of the database directory must be known. This is normally
Several conversion programs are provided with the SMILES Toolkit
for converting structure files in other formats to Daylight
SMILES-rooted TDTs. These programs are provided as contributed
source code, and found in
Other DB Admin Tasks - Many other database administration tasks are well suited to the use of scripts. Building databases, saving them to files, extracting subsets, performing periodic searches, obtaining database statistics, server access statistics -- all these and more may be automated using scripts and the Daylight suite of Thorfilters programs.
Many features are available only to SMILES-rooted databases, making it highly desireable to use SMILES as the root of all TDTs. However, this may be impossible for substances of unknown or non-determinate structure (e.g., beeswax, Fresca, eye of newt). Thor permits TDTs rooted in non-SMILES identifiers, but does not allow sub-trees with additional identifiers in these TDTs. Only data belonging to the ID is allowed.
A new category of datatype is introduced in 4.61, the non-identifier crossreference. Denoted by a preceding slash "/", these datatypes may occur anywhere and are regarded syntactically as data. However, Thor will automatically create a cross-reference corresponding to this datum providing access to the root TDT, SMILES-rooted or non-SMILES-rooted.