Thor/Merlin 4.5: Monomer table caching


Monomer databases and monomer tables

Thor databases containing monomer-level data (e.g., combinatorial data in CHORTLES) have an associated database containing the monomer definitions. Thor clients accessing such databases download the entire contents of the monomer database as a monomertable object the first time it is needed (e.g., for normalization). This step can get time-consuming if there are many thousands of monomer definitions (client program startup is slow).

Monomer table caching

The option of caching monomer tables locally is introduced in selected 4.51 Thor client applications (thorlookup and daytoolserver). Monomer table caching is invoked by setting the client program option -THOR_MONOCACHE_DIR to the name of a local directory (e.g., /tmp). This directory will be used to hold one monomertable cache file for each monomer database which is accessed. (Such files have long names which identify the remote thor service and the remote server's database path.) Monomer tables will be (re)cached in that directory only if they don't already exist or the underlying monomer database has changed since the table was last cached.

This is how the monomer table caching scheme works:

If this process fails at any step (e.g., the cache file is not accessible or the disk is out of space), monomer definitions are downloaded in the normal (slow) manner.

For advanced users ...

Users are advised to some testing before invoking monomer table caching in a production environment. The (otherwise undocumented) -DEBUG option to thorlookup allows you to write scripts which invoke and time monomer table caching under various conditions. DEBUG "cachedir=" and "timecheck" tags are used, e.g.,
   $ thorlookup -DEBUG "cachedir=/tmp timecheck" mixbase@bob ...
The new database property "monomtime" is defined in support of this caching scheme, i.e., the C statement:
   str = dt_info(&lens, database, "monomtime");
will return the date and time that the database's monomertable was last updated, as a dt_String. (The caching scheme described above only tests whether this is same-or-different than that stored with the cache.)

Summary

A mechanism for client-side monomertable caching is introduced which can significantly reduce the time it takes a Thor client to start accessing combinatorial databases. It is most suitable for use with databases which are accessed many times between changes to the monomer-table, e.g., all static databases, when thorlookup is used repeatedly in scripts, and when a remote toolkit server is serving many clients accessing such databases. The benefit of caching is increased when communication bandwidth is limited, e.g., when working over a slow or busy network, or with databases on slow media such as CD-ROMs.

We introduce this capability with some trepidation because this scheme violates one of the principles of the Daylight toolkit, which is, "Never do any visible I/O." (Of course there's another one which is, "Never say never.") Initially, this feature is likely to be used in a few high-volume environments -- we'll report the results as we hear them.


Daylight Chemical Information Systems, Inc.
info@daylight.com