The foundation of a chemical information system is its ability to represent molecules in a computer and to communicate a molecule's structure from one place to another. This can seem like a simple problem at first glance so that easy solutions are often proposed and implemented. But a close examination of the problem reveals that several subtle traps await the unwary and methods of avoiding them must be considered before an effective computer representation of a molecule can be designed.
To represent a molecule in a computer, we must first choose a particular physical model. Many models have served chemists, ranging from the Bohr model through the most modern quantum theory; all have had adherents, detractors, uses, and flaws. When using such models, we must always avoid the trap of arguing that a particular model is right rather than arguing that it is useful. Models are just that - models.
Daylight's system represents molecules using a fairly standard valence model. For example, the Daylight system understands the normal valences of organic compounds, and by counting the bonding electrons in a molecule, can fill in unspecified hydrogens, detect aromatic and anti-aromatic ring systems, and issue warnings when unlikely or impossible molecules are specified.
The Daylight system represents a molecule as a graph in which the nodes are atoms and the edges are bonds. Each atom has a several properties, including its atomic number, atomic weight, charge, and the number of attached hydrogens. If the atom is a chiral center, it can also have chiral specifications.
Bond properties are simpler: a bond is single, double, triple, or aromatic. The concept of aromaticity in the Daylight system is not a chemical one, but rather is a set of rules designed for a chemical nomenclature system (this is discussed more in the SMILES chapter).
There is some flexibility in this valence model. Molecules can be represented as a hydrogen-suppressed graph (hydrogen atoms are represented as a property of "heavy" atoms) or as a hydrogen-complete graph (hydrogens are represented the same way as other atoms). Bonds in cyclic structures can be represented as aromatic or as the alternating single/double bond Kekulé form. Isotopic information such as chirality and atomic mass can be unspecified, partially specified, or completely specified.
There are two parts to ring-detection in a graph:
The Daylight system defines bond type and bond order as follows:
Note that the definition of aromaticity is not intended to imply anything about the reactivity, magnetic resonance spectra, heat of formation, or odor of substances. Rather, the definition is designed to be useful in a chemical nomenclature system (SMILES) that is discussed in detail in the subsequent chapter.
Chemical nomenclature systems such as SMILES require a canonical labeling of the atoms and bonds - a numbering that is independent of the history of the molecule's representation. The Daylight system generates such a labeling whenever it generates a unique SMILES.
A reaction consists of an set of molecules, each of which plays a specific role in a reaction: reactant, product, or agent. Since reactions are made up of molecules, reactions naturally use the same valence model, bonding, aromaticity, and symmetry rules as molecules. At minimum, a reaction must contain valid molecules based on these rules.
In an ideal world (at least from an information-processing point of view), all reactions would be represented stoichiometrically (every relevant atom shown), and enough information would be present to tell unambiguously which atom was which between the reactants and products. This information would be provided by a pairwise mapping of the reactant and product atoms. In effect, the only difference between the reactant molecule(s) and product molecule(s) would be the bond changes and atom property changes (chirality, charge, aromaticity) which occur during the reaction. If these criteria are met, one can 'superimpose' the reactants and products on one-another and represent the reaction as a reaction graph. This is both a complete and compact description of a reaction.
Unfortunately, these stringent requirements can rarely be met for reactions available in electronic form. The Daylight system is designed to be able to represent and store both completely specified (reaction graph-like) reactions and information-deficient reactions in a repeatable and searchable fashion. Although all of the molecules within a reaction must be chemically valid, an overall analysis of the reaction for chemical sensibility is not carried out.
The Daylight system is oriented towards single-step reactions with the following three roles for molecules defined:
Note that the above distinctions between reactant, agent, and product all involve the participation of atoms in the reaction. This participation is recorded via the reaction atom map. The atom map simply maps the correspondence of the reactant and product atoms in the reaction. Agents never have meaningful atom maps, since by definition agent atoms do not participate directly in a reaction.Clearly, reactions have additional data which one wants to store about them. The Daylight approach is to only encode the pure structural information in the lexical representation of the reaction and handle the additional data outside of the reaction. A standardized THOR database can allow coupling of the following data about the individual components of a reaction to those components:
The Daylight system provides an algorithm for generating these schematic diagrams ab initio - a drawing can be made of any molecule or reaction, whether or not it has ever been seen before. When generating a schematic diagram, two criteria are critical:
Go To Next Chapter...3. SMILES - A Simplified Chemical Language