Reaction Capabilities


General: Reactions in Databases

There are no significant differences between handling of reactions and handling of molecules with respect to the Thor and Merlin database systems. Both Thor and Merlin operate on SMILES and SMARTS strings, datatrees, databases, and servers, and none of these have any specific characteristics which limit their use to either molecules or reactions exclusively.

There are cases where reactions have additional behaviors, however these are minor and fit in well with the overall Thor and Merlin architecture.

Thor

Datatrees may be rooted in either molecule or reaction SMILES. All of the Daylight datatree manipulation routines and applications work equally well on reaction and molecule datatrees. (See a hyperthor example from CCR97demo).

There is one new database normalization which is specific to reactions: the "MAKERXNMOL" normalization. It is a specialized AUTOGEN normalization which generates new subtrees based on the reactant, agent, and product components in the dataitem. The typical datatype definition for $SMI<> is now:

$D<"$SMI">
_N<USMILES MAKERXNMOL $RMOL,$AMOL,$PMOL>
...
|

The MAKERXNMOL keyword must be followed by a comma-separated list of three datatypes. The three datatypes are the reactant, agent, and product datatypes for the generation. The MAKERXNMOL processing adds dataitems during the normalization step:

Before:

$SMI<"CCO.CC(=O)O>>CCOC(=O)C">
|
After:
$SMI<"CCO.CC(=O)O>>CCOC(=O)C">
$RMOL<CCO>
$RMOL<CC(=O)O>
$PMOL<CCOC(=O)C>
|

The types of Thor lookups which are supported include:

Merlin

Within Merlin, searches include full super- and sub-structure search over any or all components of the reaction using both reaction and molecule SMARTS, similarity searching using structural fingerprints, and similarity operations using difference fingerprints.

Merlin supports full searching using reaction SMARTS, including atom maps. One "trick" to remember is that atom mapped queries must be searched in merlin using the ISM<> column, since only absolute SMILES include the atom maps.

Reaction Fingerprints

There are two different types of fingerprints used for reactions. The "normal" structural fingerprints are generated as follows:

  1. All component molecules of a reaction are fingerprinted and combined using a logical OR operation into a single fingerprint,
  2. Product component molecules are refingerprinted, using a slightly different hashing algorithm, and these fingerprints are added to the fingerprint from step #1.

The behavior of the structural fingerprint for reactions is correct for substructure screening, however this fingerprint isn't particularly useful for reaction similarity measures, since it encompasses gross structural information about the reactants and products, rather than any specific information about the reaction mechanism.

The difference fingerprint is much more useful for similarity measures for reactions. It is generated as follows:

  1. All reactant components are fingerprinted, and the counts of all the enumerated paths is stored.
  2. All product components are fingerprinted, and the counts of all the enumerated paths is stored.
  3. The counts of occurances of every path are compared, and any path where the counts change from reactant to product result is bits being set.

The difference fingerprint only sets bits for paths which pass through the reacting center (bonds which change during the reaction). This is a much more useful characterization of the reaction mechanism than the normal structural fingerprint.

Transform Processing

Transforms are a special class of entity. A transform represents a generic reaction. A transform can be applied to a molecule or set of molecules. If the molecules are capable of undergoing the generic reaction, then one can perform the transformation. The result of the transformation is one or more specific reaction examples from the generic reaction. Try react.cgi.

Transforms can be stored in Thor databases, provided that any SMARTS atomic expressions are replaced with valid SMILES. Typically, the SMARTS are replaced by the "*" atom. An atom n-tuple can be used to store the atom expressions. For example, the transform "[*+1;n,N:1][H:2]>>[*+0;n,N:1].[H+1:2]" can be stored in a datatree as:

$SMI<"[*H+]>>*.[H+]">
ISM<"[*:1][H:2]>>[*:1].[H+:2]">
ALAB<"*+1;n,N",,"*+0;n,N",>
|
This allows the SMIRKS to stored, fingerprinted, and manipulated as SMILES and also to be reassembled from the ISM<> and ALAB<> dataitems.

Reactions can be classified and transforms can be abstracted from specific reaction examples (B. Rohde has reported a method).


Forward to "Available Databases".
Return to table of contents.
Daylight Chemical Information Systems, Inc.
jjdelany@daylight.com