12. Reaction Toolkit

Back to Table of Contents

12.1 Introduction:

The reaction toolkit provides a set of tools which support both specific and generic single-step reactions. These tools add the capability to address numerous reaction-oriented chemical information problems. These tools are integrated into the Daylight system and are used extensively within Thor and Merlin to add support for reactions to these systems.

The reaction toolkit adds support for two additional object types:

Reaction Toolkit Object Classes
Reaction a single-step reaction
Transform a generic reaction

The reaction object is actually implemented within the Smiles toolkit library. The transform object is implemented within the Smarts toolkit library. Note that the reaction toolkit is licensed separately, even though the toolkits are contained within the Smiles and Smarts libraries.

12.2 Polymorphism and the Reaction Toolkit:

The extensive use of polymorphism for both reaction and transform objects is one of the key principals which makes the reaction toolkit convenient to use. A design criteria for a reaction object is that it behave as much like a molecule object as possible. Similarly, a design criteria for the transform object is that it behave like a pattern object.

In effect, a reaction object is a "superset" of a molecule object. A reaction can do everything a molecule can, and then some (which we'll cover in detail).

For example, a reaction contains one or more molecule objects. These are the components of the reaction (reactant, agent and product molecule). Each of these molecule objects in turn contains atoms, bonds, and cycles. Now one can certainly take a stream of molecules over a reaction. This works as one would expect, returning a stream which contains every component molecule in the reaction.

dt_stream(reaction, TYP_MOLECULE) => all molecules in the reaction
One can also take streams of atoms, bonds, or cycles over a reaction, effectively ignoring the molecule layer of the reaction. In this case, the streams work exactly the same for molecules and reactions.
dt_stream(reaction, TYP_ATOM) => all atoms in the reaction
dt_stream(reaction, TYP_BOND) => all bonds in the reaction
dt_stream(reaction, TYP_CYCLE) => all cycles in the reaction
Note that in the case of streams of atoms or bonds over a reaction, the resulting stream will contain ALL of the atoms, bonds or cycles in every molecule in the reaction.

Generally, the strategy for reaction toolkit programming is to ignore the "molecule layer" of a reaction whenever possible. This results in toolkit code which is most flexible in that the code will correctly process both molecules and reactions.

As an example, consider the following code:

#include "dt_smiles.h"
#include "dt_depict.h"

main() {
  dt_Handle ob, d, atoms, atom;
  char line[400], *msg;
  int len, count;

  /*** Get SMILES from user ***/

  if (!gets(line)) return (0);

  /*** Create object. dt_smilin returns a molecule or reaction, but we
       don't care which.  The rest of the toolkit calls operate equally
       well on either. ***/

  ob = dt_smilin(strlen(line), line);

  /*** We could check the type of object returned if we wanted, but it isn't
       necessary  (dt_type(ob) would return TYP_MOLECULE or TYP_REACTION) ***/

  count = 0;
  atoms = dt_stream(ob, TYP_ATOM);
  while (NULL_OB != (atom = dt_next(atoms)))
    if (dt_number(atom) == 6) count++;        /*** Count carbons ***/
  dt_dealloc(atoms);

  printf("The object contains %d carbon atoms.\n", count);

  /*** Note that dt_alloc_depiction(3) can take a reaction or molecule object
       in version 4.5 ***/

  d = dt_alloc_depiction(ob);
  dt_calcxy(d);

  /*** Call drawing library to show depiction ***/

  dl_beginscreen();
  dt_depict(d);
  dl_endscreen(d);

  /*** Destroy objects. ***/

  dt_dealloc(d);
  dt_dealloc(ob);
  return(1);
}
Whether the user enters a reaction or molecule SMILES is completely irrelevant to the program, the way it is coded, or its execution. This example program and many others like it (cansmi, showparts, protons, hbonds, smarts_filter, addfp, etc.) only need be recompiled under version 4.51 or later to be fully reaction-capable.

The other important factor which makes the reaction toolkit convenient is the treatment of derivative objects (paths, substruct, pathsets, depictions, conformations, fingerprints). Each of the derivative object types has been extended to handle Reaction objects directly. There is no need to use or understand the behavior of a bunch of new derivative objects specifically for reactions.

In the case of derivative objects, the molecule layer of a reaction is ignored; the derivative objects just work at the atom and bond layer. For example, the depiction object used in the example code above handles reactions just as well as molecules. One can create a depiction for either a molecule or a reaction object. The returned depiction objects behave exactly as in version 4.42 with one exception: the base object (dt_base(3)) of a depiction may now be either a reaction or molecule; in version 4.4 the base of a depiction was always a molecule. See section 12.7 for further discussion of derivative objects and reactions.

12.3 Processing reactions:

A reaction consists of a set of molecule objects, each has a specific role in the reaction: reactant, product, or agent. Agents are molecules which do not contribute atoms to the products, or accept atoms from the reactants. Note that this definition is not enforced by the toolkit. It is manifested in the definition of atom maps for reactions.

This section focuses on tookit functions which are specific to reaction objects or functions which have new, unique behaviors for reaction objects. These functions are generally useful for building reactions from scratch and for manipulating reaction objects.

dt_alloc_reaction(void) => Handle reaction
Allocates a new, empty reaction object. This reaction will have no child molecule components.

dt_addcomponent(Handle reaction, Handle mol, integer role) => Handle mol
Adds a molecule object to a reaction. The role (DX_ROLE_REACTANT, DX_ROLE_AGENT, DX_ROLE_PRODUCT) indicates the role which the molecule will take in the reaction. A copy of the molecule is added to the reaction. The original molecule is unchanged. The reaction must be in modify-on state. Returns the molecule object within the reaction to which the given molecule was added.

Practically speaking, a reaction object will have at most one each of reactant, agent, and product molecules and these are generally processed (eg. streams of molecules over a reaction) in reactant-agent-product order. If one adds multiple molecule objects to a reaction with the same role, these are combined within the reaction object. The way to think about this is that molecules are used as the internal representation of structural data in a reaction, yet the reaction object reserves the right to change it's internal representation as necessary. Since the original molecules are unaffected, this works out well.

dt_getrole(Handle ob, Handle reaction) => integer role
Returns the role which the object plays within a reaction. 'ob' can be an atom, bond, cycle, or molecule. Returns (-1) if 'ob' is not part of the given reaction. The role returned will be one of the contstants: DX_ROLE_REACTANT, DX_ROLE_AGENT, or DX_ROLE_PRODUCT. It is not possible to change the role of an object within a reaction. The role is set during creation of the reaction (via dt_smilin(3) or dt_addcomponent(3)) and is immutable.

There are quite a few functions which take on new capabilities when processing reactions:
dt_smilin(string smiles) => Handle object
When given a reaction SMILES string, interprets the SMILES and returns a newly-allocated reaction object. Note that dt_smilin(3) returns the appropriate object (either molecule or reaction) for the given SMILES string. This behavior also depends on the licenses available:

Input SMILES Toolkit licenses available dt_smilin(3) behavior
Any SMILES none Program exits
Molecule SMILES smiles returns Molecule object
Molecule SMILES smiles, reaction returns Molecule object
Reaction SMILES smiles returns NULL_OB, warning in error queue
Reaction SMILES smiles, reaction returns Reaction object

dt_cansmiles(Handle reaction, integer iso) => string smiles
Returns the canonical SMILES for a reaction. When 'iso' is FALSE, returns the unique SMILES. The unique SMILES is the canonical SMILES where all agents, isomeric and isotopic information, and atom maps are ignored for generation of the SMILES.

When 'iso' is TRUE, returns the absolute SMILES for the reaction. This includes all agents, isotopic and isomeric information, and atom maps.

dt_xsmiles(Handle reaction, integer iso, integer explicit) => string smiles
Returns an exchange SMILES for a reaction. When 'iso' is FALSE, returns an exchange SMILES without map, stereo or isotopic information.

When 'iso' is TRUE, returns an absolute exchange SMILES for the reaction. This includes all agents, isotopic and isomeric information, and atom maps.

The 'explicit' parameter, when TRUE, returns an exchange SMILES with all atomic properties explicit in the string.

dt_type(Handle reaction) => integer TYP_REACTION
For a reaction object, returns the constant TYP_REACTION.

dt_typename(Handle reaction) => string "reaction"
For a reaction object, returns the string constant "reaction".

dt_info(Handle reaction, string "smiles") => string input SMILES
Returns the input SMILES string used to create the reaction object.

dt_mod_is_on(Handle reaction) => boolean state
Returns the modify-state for the given reaction.

dt_mod_on(Handle reaction) => boolean ok
Puts a reaction object and all of its component molecules in modify-on state. A reaction must be in modify-on state to add components, or modify any of the component molecules. Note that one can indirectly put a reaction in modify-on state by calling dt_mod_on(3) for one if its component molecules.

dt_mod_off(Handle reaction) => boolean ok
Puts a reaction object and all of its component molecules in modify-off state. Causes every molecule to be checked for structural validiity. This function fails if any of the component molecules is invalid. If the function fails, the entire reaction is deallocated.

dt_dealloc(Handle reaction) => boolean ok
Deallocates a reaction and all of its component molecules, atoms, bonds and cycles.

The following code gives a simple example of creation and manipulation of a reaction object. In this example, a reaction is built two different ways: first, a reaction is created from scratch, and molecule objects are added to build up the reaction. Second, a reaction is built from a single reaction-SMILES. The resulting reactions have the same unique SMILES.

void build_reaction(void)
{
  dt_Handle reaction1, reaction2;
  dt_Handle mol1, mol2, mol3;
  dt_String smi1 = "CCO";
  dt_String smi2 = "CC(=O)O";
  dt_String smi3 = "CCOC(CC)=O";
  dt_String smi4 = "CCO.CC(=O)O>OCC>CCOC(=O)CC";
  dt_String cansmi1, cansmi2;
  dt_Integer slen1, slen2;

  /*** Make molecule objects.  We'll build the reaction from its pieces ***/

  mol1 = dt_smilin(strlen(smi1), smi1);
  mol2 = dt_smilin(strlen(smi2), smi2);
  mol3 = dt_smilin(strlen(smi3), smi3);

  /*** Make an empty reaction.  Set it to mod on.  Add the pieces. ***/

  reaction1 = dt_alloc_reaction();
  dt_mod_on(reaction1);

  /*** Note: ethanol added twice, as reactant and agent.  This is legal. ***/

  dt_addcomponent(reaction1, mol1, DX_ROLE_REACTANT);
  dt_addcomponent(reaction1, mol1, DX_ROLE_AGENT);

  dt_addcomponent(reaction1, mol2, DX_ROLE_REACTANT);
  dt_addcomponent(reaction1, mol3, DX_ROLE_PRODUCT);
  dt_mod_off(reaction1);

  /*** The molecules are no longer needed (copies are kept by the reaction).
       We can deallocate them. ***/

  dt_dealloc(mol1);
  dt_dealloc(mol2);
  dt_dealloc(mol3);

  /*** Get the unique SMILES for the reaction. ***/

  cansmi1 = dt_cansmiles(&slen1, reaction1, FALSE);
  if (cansmi1 == NULL) return;

  /***  Make a second reaction from a SMILES.  ***/

  reaction2 = dt_smilin(strlen(smi4), smi4);
  cansmi2 = dt_cansmiles(&slen2, reaction2, FALSE);
  if (cansmi2 == NULL) return;
  
  /*** The two unique SMILES shold be the same.  ***/

  if ((slen1 == slen2) && (0 == strncmp(cansmi1, cansmi2, slen1)))
    fprintf(stderr, "The two SMILES are the same.  Life is good.\n");
  else
    fprintf(stderr, "The two SMILES are different.  Life is bad.\n");

  dt_dealloc(reaction1);
  dt_dealloc(reaction2);
  return;
}

12.4 Reaction Molecules:

Reactions are made up of molecule objects. These are normal molecules, with a new property, role, which is used to distinguish the reactant, product and agent in a reaction. Molecules within reactions have the reaction as a parent, and have a value defined for their role property, but are otherwise indistinguishable from any other molecules in the toolkit.

dt_parent(Handle molecule) => Handle parent
Prior to version 4.5, a molecule never had a parent object. In version 4.5 and later, if a molecule is part of a reaction object, it's parent will be that reaction, otherwise this function will return (NULL_OB).
dt_dealloc(Handle molecule) => boolean ok
Removes the molecule from its parent reaction, and deallocates it.

dt_mod_on(Handle reaction) => boolean ok
For a molecule which is part of a reaction, puts both the molecule itself and its parent reaction in modify-on state.

dt_mod_off(Handle reaction) => boolean ok
For a molecule which is part of a reaction, puts both the molecule itself and its parent reaction in modify-on state.

This is identical to callind dt_mod_off(3) for the parent reaction. In effect, the toolkit treats a reaction and its component molecules as a single unit for structural modification; setting the state for either the reaction or one of its child molecules sets the state for all of them.

In general, if one is modifying molecules which are part of a reaction, it is best to perform dt_mod_on() and dt_mod_off() on the reaction object itself, rather than the component molecule(s). One can easily get confused if one attempts to set mod-on and mod-off for the component molecules in a reaction.

12.5 Atom Maps:

Within the SMILES language for reactions, atom maps are numeric atom labels. All atoms within a SMILES string with the same atom map label are associated in an atom map set.

Within the toolkit, atom maps are manipulable only as atom map sets. The toolkit takes care of interpreting the labels on input SMILES and labeling the output SMILES in a systematic way.

Agent atoms and atoms which are not part of a reaction may never be put in an atom map class. Only reactant and product atoms from the same reaction may appear in a given atom map class.

There are no requirements for completeness or uniqueness of the atom mappings over a reaction. Atom mappings are independent of the connectivity and properties of the underlying molecules. The rules for an atom maps are as follows:

  • Only reactant and product atoms may belong to atom map classes. Atoms which are not part of a reaction cannot belong in atom map classes.
  • An atom may be unmapped or may only belong to one atom map class at a time.
  • Atom map classes must contain at least least one reactant and one product atom from the reaction.
  • If either the last reactant or last product atom is removed from an atom map class, the atom map class is removed.

dt_setmap(Handle atom1, Handle atom2) => boolean ok
Sets the two atoms to be in the same atom map class. 'atom1' and 'atom2' must be atoms from the reactant and product of the same reaction, in either order.

If either 'atom1' or 'atom2' already belongs to a map class, the result of this operation is to merge the sets of atoms into a single map class which contains 'atom1', 'atom2', and any atoms which were previously mapped to 'atom1' or 'atom2'. For example, the following four functions, applied in any order, result in a single map class which contains atoms: r1, r2, r3, p1, p2.

      dt_setmap(r1, p1);
      dt_setmap(r2, p1);
      dt_setmap(r3, p1);
      dt_setmap(r1, p2);
      

If 'atom2' is NULL_OB, 'atom1' is unmapped from its current map set. That is, 'atom1' will no longer be mapped to any other atoms in the reaction. The atom map set from which 'atom1' is removed remains intact unless the atom map set becomes invalid. A map class becomes invalid if it no longer contains at least one reactant and one product atom. If the atom map set becomes invalid, all of the remaining atoms are unmapped from one-another.

dt_getmap(Handle atom) => Handle substruct
Returns a substruct based on the reaction containing all of the atoms in the atom map set to which 'atom' belongs or NULL_OB if atom is unmapped or is not an atom in a reaction.

dt_mapped(Handle atom1, Handle atom2) => boolean mapped
Tests the two atoms. If the two atoms are in the same map class, returns TRUE. Otherwise, returns FALSE. This is a convenience function. It is somewhat more efficient than performing the same operation by getting the substruct for one atom and testing the other against the substruct.

12.6 Hydrogens in Reactions:

Hydrogens in reactions are handled as with molecules (suppressed unless the hydrogen is special). With reactions, there is an additional case which will make a hydrogen special. It is often desireable (eg. 1,5-hydride shift) to store information about the location of hydrogens as part of the atom map of a reaction. Hydrogens with a supplied atom map are considered "special" and these hydrogens are not suppressed in the toolkit. These mapped hydrogens appear explicitly in Isomeric SMILES for reactions. Otherwise, atom-mapped hydrogens do not appear in canonical SMILES.

Note that the special hydrogen dt_isohydro(3) can not be part of any atom map class. Hence, this special hydrogen can never be used in place of an atom-mapped hydrogen in a reaction. Any atom-mapped hydrogens must be stored as explicit hydrogens.

12.7 Reaction Queries:

A reaction query is expressed with the SMARTS language. SMARTS has been extended with reaction and atom map query syntax. There is no separate pattern object for a reaction query. When a SMARTS is interpreted, a pattern object is returned. In effect, the pattern object takes on the additional expressive capabilities for reactions.

dt_smartin(string SMARTS) => Handle pattern
Evaluates the given SMARTS string and creates a pattern object from it. The SMARTS may be any valid molecule- or reaction-SMARTS.
dt_smarts_opt(string SMARTS, integer vmatch) => string SMARTS
Returns an optimized SMARTS string. Works correctly for both molecule- or reaction-SMARTS. If "vmatch" is TRUE and the given SMARTS string is for a reaction query, dt_smarts_opt fails. Vector matching on reaction queries is not allowed.

12.8 Reactions and other objects:

The flexibility and utility of the Daylight toolkit arises partly because of the ability to create derivative objects based on Molecules. These objects include paths, substructs, pathsets, depictions, conformations and fingerprints. Each of these objects has a specific unique purpose within the toolkit, however they all share some common features which are important for reaction processing:

  • They all have a molecule as their base object,
  • they all store data about the atoms and bonds in a molecule,
  • and they all ignore other attributes of the molecule not directly related to the atoms and bonds in the molecule.

These features allowed us to directly extend these objects to handle reactions. As discussed in Section 12.3, the "molecule layer" of a reaction is ignored; only the atoms and bonds of a reaction are considered.

Hence, each of these objects is now defined as having either a molecule or a reaction as its "base" object. Otherwise, their behaviors are essentially unchanged. They still store data about the atoms and bonds in their base object, and they still ignore other non-relevant attributes of their base object (like the molecules).

Briefly, we address each of the main derivative types in the next sections and highlight their behaviors with regard to reactions.

12.8.1 Paths and Substructs:

Paths and substructs are collections of atoms and bonds, which all come from the same base object. With reactions, this behavior remains unchanged. The atoms and bonds within a path or substructure must come from the same reaction but they may be from different molecules within a reaction. For example, the following code creates a path from a reaction object, adds all of the double-bonds from the reaction to the path, and returns the path.

dt_Handle get_db(dt_Handle ob)
{
  dt_Handle bonds, bond, path;

  /*** Inappropriate type ***/

  if ((dt_type(ob) != TYP_MOLECULE) &&
      (dt_type(ob) != TYP_REACTION)) return (NULL_OB);

  /*** Make a path.  The base of the path will be "ob" ***/

  path = dt_alloc_path(ob);

  /*** If a reaction, ignore the molecule layer.  Only deal with the
       bonds.  If a molecule, this happens by default. ***/

  bonds = dt_stream(ob, TYP_BOND);
  while (NULL_OB != (bond = dt_next(bonds)))
    if (dt_bondorder(bond) == DX_BTY_DOUBLE)
      dt_add(path, bond);

  /*** Clean up and return ***/

  dt_dealloc(bonds);
  return (path);
}

Note that absolutely no consideration is given to the fact that the bonds may be in different molecules within the reaction. As long as the atoms and bonds added to a path or substruct are all part of the correct base object (the object given in dt_alloc_path(3)) this succeeds.

12.8.2 Pathsets:

A pathset is a collection of paths over the same base object. The base object may be a reaction. A pathset is returned from the SMARTS matching functions.

In this case, the pathset returned depends on the type of target used for the match function:

dt_match(Handle pattern, Handle target, integer limit) => Handle pathset
dt_umatch(Handle pattern, Handle target, integer limit) => Handle pathset
This returns a pathset with "target" as its base object. "Target" may be either a reaction or molecule. The pathset will contain one or more paths. The base object (dt_base(3)) of the pathset and all paths withing the patheset will be the target object. target object. Note that this behavior holds regardless of the type of pattern used in the query (reaction or molecule query).

The semantics for pattern matching are as follows:

Pattern Target Result
Molecule query Molecule object Molecule substructure matches
Molecule query Reaction object All substructure matches over entire reaction
Reaction query Molecule object No hits
Reaction query Reaction object Reaction substructure matches

dt_vmatch(Handle pattern, Handle target, integer limit) => Handle pathset
This returns a pathset with "target" as its base object. "Target" may be either a reaction or molecule. The pathset will contain one or more paths whose base object will be the same target object.

There is one important exception for vector-matching: It is only legal to use a molecule pattern for dt_vmatch(3). One may match the molecule pattern against either a reaction or molecule target, but it is not possible to use a reaction pattern for vector matching on any target (reaction or molecule).

12.8.3 Depictions:

The main distinction between a reaction depiction and a molecule depiction is the presence of a reaction arrow, and the potential desire to lay out the various reaction parts (reactant, agent, product) in different regions. These two functions are handled with dt_depict(3), and dt_calcxy(3); all other depiction-related functions remain unchanged.

dt_calcxy(Handle depiction) => boolean ok
Sets the coordinates for the atoms of the given depiction. In the case of a reaction depiction, it lays out the reactants, agents and products in a left-to-right orientation, with the reactants and products centered vertically and the agents shifted above the center.

If atom map classes are available for the atoms in the depiction, the toolkit will attempt to orient the reactant and product sides of the depictions the same way. The toolkit attempts to minimize the RMS distance between mapped atom pairs by reorienting the product part of the reaction depiction before laying out the parts of the reaction. This orientation first applies to ring atoms within the depiction. If no mapped ring atoms are found, non-ring atoms are used.

dt_depict(Handle depiction) => boolean ok
Generates the depiction, using the Daylight drawing library. For a reaction object, automatically includes a scaled arrow in the drawing. The toolkit provides no access to the arrow itself, it is drawn by the toolkit using the framega set for the depiction object.

The arrow is positioned as follows: a horizontal vector is laid out between the midpoints of the reactant and product parts of the depiction. The vector is clipped so that it doesn't overlap any parts of the reaction. Finally, the clipped vector with an arrowhead is drawn. If it is not possible to clip the vector so it doesn't overlay any part of the reaction, the toolkit will then draw a short arrow between the midpoints of the reactants and products, ignoring any overlap.

12.8.4 Conformations:

The conformation object allows the storage of (x, y, z) coordinate data for the atoms in a molecule and reaction. A conformation object makes no distinction between the roles of atoms in the reaction object. With the exception of allowing a conformation to be created from a reaction, all conformation-oriented functions remain unchanged.

12.8.5 Fingerprints:

The fingerprint object does behave differently for a reaction object versus a molecule object. The differences are seen when creating a fingerprint object, all other fingerprint toolkit functions remain unchanged. In addition, there is a new fingerprint-creation function, dt_fp_differencefp(3), which is designed primarily for reaction processing.

dt_fp_generatefp(Handle object, integer minstep, integer maxstep, integer size) => Handle fingerprint
Generates a fingerprint object from the given molecule, reaction, substruct, or path. For reaction objects or reaction-derived paths and substructs, the resulting fingerprint object is equivalent to the bitwise-OR of the following fingerprints:
  • the fingerprint of the reactant part,
  • the fingerprint of the product part,
  • a bit-shifted fingerprint of the product part.
This behavior allows the fingerprint to serve as a structural screen for all superstructure-matching and allows the fingerprint to provide some discrimination power between reactant and product parts.

For reactions, the fingerprints tend to be quite dense, and are somewhat less efficient a structural screens that for molecules. The main advantage of this scheme is the full compatability of these reaction fingerprints with molecule fingerprints in the Daylight system. Note also that this fingerprint scheme doesn't provide the most appropriate measure of similarity for reactions.

dt_fp_differencefp (Handle object, integer minstep, integer maxstep, integer size) => Handle fingerprint
Generates a difference fingerprint object from the given molecule, reaction, path, or substruct object. This function is oriented towards reaction processing, so isn't very useful for molecules and molecule-derived paths or substructs.

For a molecule or molecule-derived object, returns the normal fingerprint, (identical to dt_fp_generatefp(3)).

For a reaction or reaction-derived object, returns the difference in fingerprint between the reactant and product parts of the object as follows:

  1. Generates the count of each path in the reactant part.
  2. Generates the count of each path in the product part.
  3. For any paths whose count changes from reactant to product part, sets a bit in the final fingerprint.
The net result of these operations is a fingerprint of the connectivity change for a reaction. This is an extremely useful way to analyze and cluster reactions.

There is one important caveat for difference fingerprints: to work optimally, the reaction must have unit stoichiometry. If not, missing atoms on either side of the reaction will result in extraneous bits being set in the difference fingerprint.

12.9 Transforms

Transforms are very similar in behavior to patterns. Essentially the transform language is a subset of SMARTS, with some additional specific requirements. These requirements are validated on input of the transform. This also means that any valid SMIRKS is also a valid SMARTS. This also means that a SMIRKS can be optimized by dt_smarts_opt(3). A more extensive discussion of the relationship of SMILES, SMARTS, and SMIRKS can be found in the Daylight Theory Manual.

dt_smirkin(string SMIRKS) => Handle transform
Interprets the given input string as a SMIRKS and creates a transform object from the SMIRKS.

dt_smarts_opt(string SMIRKS, integer vmatch) => string SMIRKS
Returns an optimized SMIRKS string. Remember, SMIRKS are a subset of SMARTS. "vmatch" must be FALSE for transform SMIRKS. Optimizing a SMIRKS is useful because the first step in application of a transform object is a SMARTS-match on either the reactant or product side of the transform. Hence, the optimizations performed by dt_smarts_opt(3) are also relevant to transforms.

dt_type(Handle transform) => integer TYP_TRANSFORM
For a transform object, returns the constant TYP_TRANSFORM.

dt_typename(Handle transform) => string "transform"
For a transform object, returns the string constant "transform".

dt_info(Handle transform, string "smirks") => string input SMIRKS
Returns the input SMIRKS string used to create the transform object.

dt_match(Handle transform, Handle target, integer limit) => Handle pathset
Performs a SMARTS match, using the transform object as a pattern, and returning the pathset over the target reaction. Note that any valid SMIRKS is also a valid SMARTS.

dt_pattern(Handle transform, integer role) => Handle pattern
Returns a molecule pattern object from the "role" part of the transform.

Transforms can be applied to molecule objects. The result of these operations is the creation of new reaction objects which contain both the starting molecules and a set of newly-created molecules. Transforms are bidirectional, they can be applied in either the forward or reverse directions. In effect, transforms represent generic reactions. Specific instances of these generic reactions can be created from the combination of a transform and a set of molecules, which act as reactants or products in the specific reaction.

dt_transform(Handle transform, Handle som, integer direction, integer limit) => Handle sequence of reactions
dt_utransform(Handle transform, Handle som, integer direction, integer limit) => Handle sequence of reactions
dt_xtransform(Handle transform, Handle som, integer direction, integer limit) => Handle sequence of reactions
Applies the given transform to the molecule or sequence of molecules "som". Note that the molecule or sequence are not altered by the function. The result is a sequence of newly-allocated reaction objects, which represent specific instances of the reaction. The parameter "limit" controls whether only the first reaction found is returned or all of the possible answers are returned. The "limit" parameter has the same semantics as in dt_match(3).

The "direction" may be one of DX_FORWARD or DX_REVERSE. When direction is DX_FORWARD, the given molecules are treated as reactants and the transform is applied in the forward direction to the molecules. When "direction" is DX_REVERSE, the given molecules are treated as products and the transform is applied in the reverse direction.

The application of a transform logically occurs in two steps. In the forward direction, the reactant side of the transform is matched, as SMARTS, against the set of molecules given. Each place where the SMARTS matches is marked. In the second step, the atom and bond changes in the transform are applied to the matched molecules.

The only difference between dt_transform(3) and dt_utransform(3) is the function which is used to match the SMARTS expression (dt_match(3) and dt_umatch(3) respectively). The net result is that with dt_utransform(3), the resulting answers are generated from the unique set of matches, while with dt_transform(3), the complete set of answers results.

Similarly, dt_xtransform(3) uses dt_xmatch(3) for the initial SMARTS match. The net result is that dt_xtransform(3) always returns exactly one new reaction. This new reaction may have more than one application of the transform within it.

A transform (at least in one direction) can be thought of as a SMARTS expression plus a set of atom and bond changes.

The resulting sequence of reaction objects are owned by the user. Both the sequence and the reactions must be deallocated by the calling program when done with them. The given molecules or sequence of molecules are not modified by the function.

The transform processing functions set atomic properties for the newly-created reaction atoms. These properties are set in order to allow the user to correlate the SMIRKS with the resulting reaction. For example, given the amide formation SMIRKS:

[C:1](=[O:2])Cl.[H][N:4][C:5]>>[C:1](=[O:2])[N:4][C:5]

and the reacting molecules:

CC(=O)Cl.NCCC

The result of this transformation will be a reaction, with the following atomic properties set:

USMILES:  CCCN.CC(=O)Cl>>CCCNC(=O)C
tmap:       54  1  2       541  2
torder:     65  1  2 3     *97  8   (* has a value of 10)

The "tmap" property is the map class for the transform atom which matched this node in the reaction. For example, the amine Nitrogens are map class "4" in the transform, hence the tmap property for the Nitrogens in the resulting reaction are set to "4".

The "torder" property is the cardinal ordering of the reaction atoms, based on the match order of the transform. Were one to reorder the reaction atoms based on this numbering, the order would correspond to the ordering of the expressions in the SMIRKS. In the example, the original SMIRKS has 10 atomic expressions total, and the "torder" properties go from 1 - 10. The value of 4 is missing because the hydrogen is suppressed in the unique SMILES.

These properties can be accessed with the following code:

atoms = dt_stream(result_rxn, TYP_ATOM);
while (NULL_OB != (atom = dt_next(atoms)))
  {
    tmap = dt_integer(atom, 4, "tmap");
    torder = dt_integer(atom, 6, "torder");
  }
dt_dealloc(atoms);

Back to Table of Contents
Go to previous chapter WIDGETS
Go to next chapter HTTP.