15. THOR and Merlin Toolkits: Databases

Back to Table of Contents

15.1 Introduction

Although THOR and Merlin present very different views of the data in a database, both systems present the very same data. Because of this, most operations on databases are identical in the THOR and Merlin Toolkits. This includes opening and closing databases, setting the server's "search path" security operations, and datatype-object operations. This chapter covers all of these common operations.

An old adage might be paraphrased here: "An example is worth a thousand words" Many tutorial examples of Thor and Merlin Toolkit usage can be found in the "contrib" directory:

     $DY_ROOT/contrib/src/thor/
     $DY_ROOT/contrib/src/merlin/
We strongly encourage you to study these examples before attempting to write Thor and/or Merlin Toolkit programs.

15.2 Search Path

THOR and Merlin servers maintain a "search path" -- a list of directories which are to be searched for databases (see the Daylight System Administration Manual for more details). (Note that the search path is a property of a server, not a database. We put it in the databases chapter rather than the server chapter because it fits with other database operations.)

Note that the directories in the search path are interpreted by the server's operating system, hence are in a format appropriate to that operating system. For example, a Macintosh client connected to a UNIX server would use UNIX syntax to specify a database path (e.g. "/thordb/mydb"). Similarly, environment variables are interpreted on the server's operating system, not the client's.

dt_getsearchpath(Handle server) ==> Handle sos
Returns sequence of strings (SOS), each string-object of which contains a directory in the server's search path. The order of directories in the SOS is the order in which the directories will be searched to find a database.

dt_setsearchpath(Handle server, string password, string path, integer replace);
Sets the server's search-path. You can either replace the current path, or add to it.

dt_getdatabases(Handle server) ==> Handle sos
Returns a sequence of string objects (SOS) containing all databases in server's search path.

15.3 Creating and Configuring Databases

Database creation is only done by the THOR server, so the functions in this section don't apply to the Merlin Toolkit. The following functions are used to create and configure a THOR database.

15.3.1 Database Creation

dt_thor_createdb(Handle server,int dlen, string path, int sizepri, int sizexref) => Handle database

Creates a new empty THOR database and opens it with "executive" permission. The parameter path must be a complete path, not a relative path or just a filename.

The parameters sizepri and sizexref are the requested sizes of the primary and cross-reference hash tables, respectively. For more information about database sizes, see the reference page for this function and the Daylight THOR-Merlin Administration Manual.

15.3.2 Database Configuration

Each database can have one, two or three auxiallary databases associated with it:

  • The datatypes database: Contains special-purpose datatype- definition TDTs (e.g. "$D<$SMI>_V...|". Each time a new datatype is encountered, its definition is retrieved from this database. For more information about datatypes, see the Daylight Theory Guide.

  • The indirect-data database. Contains the expansions for indirect references. For more information about indirect data, see the Daylight Theory Guide.

Generally speaking:

  • a datatypes database will have no associated databases
  • an indirect-data database will have a datatypes database associated with it that defines the indirect-datatype definitions
  • a regular chemical database will always have a datatypes database and will often have an indirect-data database.

dt_thor_getauxillarydb(Handle database, integer type) => string path
Returns the path (directory, filename, and suffix) of the auxillary database associated with db of type type. Type will be either DX_THOR_DATATYPESDB or DX_THOR_INDIRECTDB.

dt_thor_setauxillarydb(Handle database, integer type, string path) => boolean ok
Sets the database that is to be associated with db as type type, where type is either DX_THOR_DATATYPESDB or DX_THOR_INDIRECTDB.

15.3.3 Database Crunching

After a series of deletions and/or replacements, a database's data files may have "holes" in them. For example, if a TDT is enlarged (e.g. new dataitems added), it will no longer fit in its original spot; new space is allocated for it and the old space is marked "unused". THOR can sometimes re-use these available spaces (depending on the server's implementation and configuration), but generally the server is unable to make 100% use of the space in a database that has been extensively modified. This can cause a database to grow to be much larger than the amount of actual data it contains.

Crunching is the process of moving all data "forward" in the file to fill in these unused spaces, leaving all unused space at the end of the file; a pass is made through the entire database, reading and re- writing data and rebuilding the hash table. Once this is done, the file is truncated to get rid of the unused space at the end, freeing the file-system space for other uses.

The crunch operation should not be undertaken lightly, as the crunch operation is indivisible; while a crunch is under way, the server doesn't respond to other clients. Depending on database size, a crunch can take anywhere from several seconds to many minutes.

During a crunch, the database is temporarily in invalid states; for example, the hash table file is invalid until the crunch operation is complete. The database may be corrupted if some error occurs (usually an interruption such as a power failure) midway through a crunch. The actual data records may not be damaged, but hash information is usually destroyed; the data are no longer accessible. In such a case, the thordump(1) utility may be required to recover the data.

dt_thor_crunchdata(Handle database) => boolean
Crunches (recovers unused space from) the primary data file of a database.

dt_thor_crunchxref(Handle database) => boolean
Crunches (recovers unused space from) the cross-reference data file of a database.

dt_thor_autocrunch_limit(Handle database, float limit) => float limit
The database's "autocrunch" parameter is used to trigger an automatic database crunch whenever the fraction of free space exceeds a limit. The fraction is computed as:
			  bytes_free
     free space =  -----------------------
		   bytes free + bytes used
This function both sets and returns the "autocrunch" limit -- the fraction of free space which, if exceeded, will trigger an automatic crunch. The limit applies to both the primary and cross-reference data files. If limit is <= 0.0, the database's limit is unaffected; this serves as a way to query the current value without modifying it. Values greater than 1.0 are not permitted. A value of 1.0 will disable autocrunching.

15.4 Opening and Closing Databases

The Toolkit calls to open and close databases in THOR and Merlin are identical, but the actual operations performed by the two servers are quite different:

  • THOR opens all of the database's data files (the primary and cross-reference data files, and the primary and cross-reference hash tables). These files remain open as long as the database is open. If caching is enabled (see below), data are read from the disk files into the Thor server's memory. If multiple clients open the same database, the server creates a "client context" for each, but shares the database resources (i.e. the files) among the clients.

  • Merlin opens the primary data file, reads its contents into memory, and closes the file. The memory remains in use as long as the database is in use (by any user). Each client that opens the same database has its own "client context" in the server, but all clients share the database's in-memory image.
dt_open(Handle server, string dbname, string permission, string password,RETURN integer isnew) ==> Handle database

Opens a database on a THOR or Merlin server. The path is the path (directories and filename) of the database on the server machine. If it is a simple filename (no directory information), the server will search its search path for the database -- the first database found in the path that matches the name is used. If path contains any directory information, it must be a complete path - partial and relative paths are not allowed. When a complete path is specified, the server's search path is ignored.

The string perm is one of "r", "w", or "e", representing read, read/write, and executive permission. The password must be the database's password for the requested permission or higher (i.e. the executive password always works, the write password works for reading or writing, and the read password only works for reading.)

dt_exists (Handle server, string dbname) ==> boolean isopen
Returns TRUE if the named database exists.

dt_isopen (Handle server, string dbname) ==> boolean isopen
Returns TRUE if the named database is already open (either open by some other client, or marked "hold" - see dt_hold() and below).

dt_ispublic(Handle server, string name) ==> boolean ispublic
Returns TRUE if the named database is "public"; that is, if it has an empty read-permission password so that it can be opened without a password.

15.5 Memory Usage: Cache and Hold

15.5.1 Merlin HOLD

It can take a long time for a Thor or Merlin server to open a database: Merlin's in-memory high-speed searching requires that it scan the entire database into memory; Thor provides various levels of "caching" -- loading heavily-used parts of the database (or even all of the database) into memory to improve performance. Because of the potentially high overhead to open a database, both Thor and Merlin provide a "hold" for databases which causes the database to remain open even when no client is using it. For Merlin, "hold" means the database is retained in memory. For Thor, "hold" means the database files remain open, and cached portions of the database remain in memory.
dt_hold(Handle database, string thorpassword) ==> boolean ok
Marks the specified database "held", so that it will be retained in the Merlin server's memory. The password is that of the user "thor", and must be supplied even if you connected to the server as the user "thor". Returns TRUE if the operation succeeded. The operation fails if the server determines that the password is incorrect, or if database is not a Merlin database (pool) object.

dt_isheld(Handle database) ==> boolean isheld
Returns TRUE if database is marked "hold". Returns FALSE if the database is not marked "hold", or if database is not a Merlin database (pool) object.

dt_release(Handle database, string execpassword) ==> boolean ok
Marks the specified database "released" (not held), so that it will be removed from the Merlin server's memory when the last client closes it. The password is that of the user "thor", and must be supplied even if you connected to the server as the user "thor". Returns TRUE if the operation succeeded. The operation fails if the server determines that the password is incorrect, or if database is not a Merlin database (pool) object. Note that the database is not released as long as any client (including the one performing this operation) has the database open. Clients can be "evicted" to force closure; see dt_evict().

15.5.2 THOR Caching

A THOR server's performance can be improved by "caching": storing frequently-used sub-parts of the database in the server's memory. This is discussed in more detail in the Daylight Theory Manual and the Daylight System Administration Manual.

Remember that a server is free to silently ignore any and all caching requests, depending on the particular implementation and the server's configuration.

Valid caching levels are symbolic constants in the THOR Toolkit:

Thor Caching Levels
DX_THOR_OFF no caching
DX_THOR_RTABLE write-through cache of hash table
DX_THOR_TABLE complete cache of hash table
DX_THOR_RALL write-through cache of everything
DX_THOR_ALL complete cache of everything

The following functions control caching:

dt_thor_cache(Handle database, int level) => boolean
Enable caching for the database. The parameter level indicates what type of caching to perform; see the table above.

dt_thor_cachecontrol(Handle database, int when, int level) => boolean
Overrides cache requests from normal users; the cache-control specification becomes a property of the database, and remains in effect when the database is closed and reopened. Requires executive permission. The parameter level indicates how much caching to perform, as described above. The parameter when indicates:

DX_THOR_CACHE_NEVER
Caching is always disabled; caching requests from other clients are prohibited and are silently ignored.

DX_THOR_CACHE_OK
Caching requests from clients are allowed; the parameter level is ignored. This is the default.

DX_THOR_CACHE_ALWAYS
Caching is forced whenever a database is opened, to the level specified by level; caching requests from other clients are prohibited and are silently ignored.

dt_thor_cachesync(Handle database) => boolean
Forces all cached data to be written to the disk immediately. This should only be done occasionally, as it is an "atomic" operation -- the entire sync is completed before any other client requests are served, which can adversely affect performance.

15.6 Database Security

There is only one function for managing the security of databases. Note that it is polymorphic; it also applies to server objects; its behavior when applied to server objects is described in the Server Security Functions chapter of this manual.

dt_setpassword(Handle database, string what, string authorizing_pw,string newpw) => boolean

Changes a password for the database.

Note that when a database's password is changed any existing users of that database are unaffected; a client program can keep a database open indefinitely even though the password used to open the database is no longer valid. Authorization is only checked when the database is opened.

The string what indicates which of the three passwords is to be changed; it must be one of "r", "w", or "e", for read, write, or executive passwords, respectively.

15.7 Record Locking

Thor provides a mechanism for "locking" a TDT ("record"). When a client program locks a record, the record is said to be "owned" by that client. The owner of a record has exclusive write access to that record; no other client can modify or delete that record (although they can read the record). A record can only be locked by one client at a time.

Record locking is an all-or-nothing affair: Conceptually, if record locking is enforced, then all records must be locked before they can be modified. In practice, if you write an unlocked record, it is automatically locked, written, then unlocked. This means if another client has that record locked, your write will fail due to a lock violation.

Once a record is locked, the client that owns the lock can do the following:

Change the record:
The client with the lock can modify the record; no other client can.

Write the record to the database:
If a modified, locked record is written to the database, the changes are "invisible" to other clients until that record is unlocked ("committed"). Other clients will "see" the original record, even though the client holding the lock sees the changes.

Delete the record:
A deletion is essentially the same as a change: Only the owner of the lock can delete the record, and the record will appear unchanged (undeleted) to other clients until it is unlocked ("committed"). Deleting a record does not unlock it -- the lock remains in effect until it is explicitely removed (which causes the deletion to be "committed").

Rollback modifications:
As long as a record remains locked, it can be "rolled back" to its original state. That is, if it has been modified or deleted, those changes are undone by the "rollback" operation. Rolling a record back does not unlock the record.

Commit modifications:
When the record is unlocked, it is "committed". That is, all modifications are finalized and become visible to other clients using the database. This includes deletion -- deletions take effect when the record is unlocked.
When a record is locked by one client, all other clients that try to use the record are restricted to read-only operations. That is, they can only retrieve and examine the record (see dt_thor_tdtget()), and find out has it locked (see dt_thor_tdtlockedby()).

It is possible to lock a record that does not exist. This is commonly necessary when writing a new record to the database -- the record is locked, then written and finally unlocked ("committed").

The actual record locks are maintained by the Thor server. If a client disconnects from a Thor server or closes a database while it still has records locked, the locks are automatically discarded and the records are "rolled back". Any changes made but not committed are lost. Locks can only be retained while a client is connected to a Thor server and has a database open.

Record locking is not necessary in most situations. Thor's ability to merge records makes it possible for users to simultaneously modify records with little chance of conflicts. On the rare occasion when conflicts arise, Thor's timestamp facility provides adequate warning.

The following functions control locking enforcement:

dt_thor_settdtlocking(Handle database, string password, dt_Integer enforce_locking) ==> boolean OK

Sets or unsets "record locking" enforcement for THOR database. If enforce_locking is TRUE, locking is enforced; if it is FALSE, locking is disabled.

You can't change record locking enforcement while the database is in use (i.e. open by any other client).

When record locking is enforced, records that are retrieved from a writeable database are automatically locked (see dt_thor_tdtget()). A writeable database is one opened with "w" or "e" permission using dt_open().

Record locking is a permanent property of the database (i.e. it is retained when the database is closed and reopened), and it applies to all client programs using the database.

dt_thor_tdtdttlocking(Handle database)
Returns TRUE or FALSE, indicating respectively that record locking is or is not enforced for the specified database.
Other functions related to or affectd by record-locking enforcement are:
     dt_thor_tdtget
     dt_thor_tdtget_raw
     dt_thor_tdtlockedby
     dt_thor_tdtput
     dt_thor_tdtput_raw
     dt_thor_tdtremove
     dt_thor_tdtremove_raw

Back to Table of Contents
Go to previous chapter THOR and Merline Servers
Go to next chapter THOR and Merlin Datatypes.