Mug '97
: 
       4.5 Thor and Merlin systems
    
    
    There are quite a few changes to Thor/Merlin systems in the 4.51 release,
    most of which are evolutionary rather than revolutionary.
    The major visible changes provide support for reactions and
    live CD-ROM databases.
    The following list describes most changes which will be visible to
    database users and managers - but, please - check the readme files
    supplied with the distribution for an authoritative list (as always).
    
       
       - Most datatype definitions now require quotes
       
- All databases must be reloaded
       (
       thordbfix451 is supplied).
       
- No changes are visible when working at the object level
          (i.e., with toolkits).
       
       - Thorserver merges datatrees locally for better performance
	   (5x faster reloads).
       
- Thorserver now reuses vacated space and coalesces adjacent
           empty blocks when able.
       
- Extra reserved space is inserted after cross-references
	   in an adaptive manner.
       
       - Thorload will optionally generate indirect data keys.
       
- Precise datatype/datafield control is provided.
       
- A particuarly efficient quadruple hash scheme is used.
       
       - USMILES --
	  produce unique SMILES for "generic" reaction
       
- AUTOGEN GRAPH --
	  record role-free, oxidation-state-suppressed reaction
       
- MAKERXNMOL --
	  crossreference reaction by component and role
       
- ATOM_NTUPLE, BOND_NTUPLE, PART_NTUPLE --
	  component-specific data
       
       - Datafields may have the PART_NTUPLE normalization tag.
       
- Data in such fields are vectors of data corresponding to
	   disconnected components.
       
- Order correspondence is maintained on canonicalization.
       
- Useful for mole-fractions of mixtures,
	   fractional stoichiometry, etc.
       
- Component fingerprint-tuples are used by merlin to good effect.
       
        - Fingerprint-tuples (FPP) are component-tuples of
	    fingerprints.
        
- FPP data is produced with the new
	    
	    fingerprint -m program option.
        
- A database may contain both
	    FPP and FPP data.
        
- Merlin will use FP and FPP
	    data as available and needed.
        
- FPP availability dramatically improves performance of
	    screening large libraries.
       
       - Reaction fingerprints distinguish reactant and product features.
       
- Product bits are offset in reaction fingerprints.
       
- No special action is needed to obtain such fingerprints --
	   these are the "normal" reaction fingerprints.
       
- Suitable for structure screening and similarity comparison
       
        - Difference fingerprints represent the difference between
	    reactants and products of a reaction.
        
- Such fingerprints characterize the transformation
	    (sans atom mapping).
        
- Can be used for clustering and as "alternative fingerprints".
        
- Can't be used at the same time as normal fingerprints (yet).
       
Reaction searching
       
       - Widgets now handle reactions (e.g., grins and depict in xvmerlin).
       
- Character screening is changed to accomodate new SMILES syntax.
       
- Reaction fingerprinting distinguishes reactant and product features.
       
- All previously-available structure-based searches now operate
	   on reactions.
       
- Similarity and cluster analysis methods work with reactions just
	   as with molecules.
       
- Searching reaction databases (with reaction queries) is generally
	   faster than molecule searching.
       
Tversky similarity search
       
       - Merlin supports Tversky measures between binary fingerprints
	   for searching and sorting.
       
- A method to calculate these measures is supplied in the
	   fingerprint toolkit as
	   
	   dt_fp_tversky().
       
- Provides a continuous range of super- to sub-structure similarity
       
- Extremes provide similarity as superstructure and as substructure
           (missing since 'old VAX v3.6)
       
- Very powerful tool for reaction searching.
       
- Might provide a "diversity metric" which measures distinctive
	   features.
       
- First described by John Bradshaw at EuroMUG-93
       
- Will be subject of a
	   
	   presentation by John Bradshaw at this meeting.
       
MCL (Merlin Control Language) interface updated
       
       - MCL now allows searching and sorting based on
	   Tversky similarity measures.
       
- The MCL processor writes "embedded HTML" output via
	   the mcl -h option.
       
- MCL documentation has been overhauled.
       
Alternative Thor database file suffixes
       
       - Previously, the primary Thor database file had to end in ".THOR".
       
- Thorserver -DATABASE_SUFFIX_LIST allows
           specification of alternative suffixes.
       
- The default value of this option is
           ".THOR|.TDB|thor|.tdb"
       
- This is required to operate with some filesystems (e.g., ISO-9660).
       
Relative database pathnames in .THOR file
       
       - File names in a .THOR file are now interpreted relative to
	      the .THOR file's directory.
       
- Simplifies moving databases from one directory to another
       
- Allows databases to reside on removable media such as a CD-ROM
       
"Readonly" Thor databases are supported
       
       - Thor database "readonly" property is saved in the .THOR file.
       
- Readonly property set via
	   
	   sthorman or
	   
	   thorchange -setaccess.
       
- Databases which are on "readonly" media (e.g., CD-ROMs)
	   are automatically readonly.
       
- Absolutely no writing takes place for a readonly database
           (e.g., no .LCK file is created).
       
- More than one thorserver can access a readonly database.
       
"Live" Thor CD-ROM databases are supported
       
       - Live databases can be run directly from the CD-ROM.
       
- Thor performance suffers but remains useful for interactive work.
       
- Merlin performance is not degraded at all.
       
- Live databases can be "burned" onto ISO-9660 formatted CD's.
       
Non-identifiers can be cross-referenced
       
       - Identifiers in a Thor datatree are cross-referenced to
	   the tree root (as always).
       
- Tree roots must be identifiers,
	   and may or may not be a SMILES (as always).
       
- Non-identifiers preceded by slash ('/') are also
	   crossreferenced (new).
       
- Allows you to build many-to-many relationships --
	   use with caution!
       
       - Provides database status including header contents
       
- "thorfilters" now provides all thor management functions
       
- sh and perl scripters are now first class citizens.
       
Lone hydrogens removed from GRAPHs
      
      - GRAPH data represent oxidation-state suppressed molecules
      
- Lone hydrogens are now removed during Thor's GRAPH normalization
      
- This was a bug-fix.
      
       - Thor clients using a database with monomer-level structures
           require the monomer definitions.
       
- When first needed, the database's whole monomoner is downloaded
	   to the client.
       
- This approach is OK for 100's to 1000's of monomers, too slow
	   for 1000's to 10,000's.
       
- Added the thor-client option of caching monomer tables in a local
	   directory (e.g., /tmp).
       
- The monomer-table cache is updated only when the monomer-database
	   changes.
       
- Best for databases which are accessed many times between
	   changes to the monomer-table
       
- Ideal for remote toolkit applications using combinatorial databases
       
    
     Daylight Chemical Information Systems, Inc.
    
    Daylight Chemical Information Systems, Inc.
    
    info@daylight.com