1st-class SMARTS patterns

Roger Sayle

Bioinformatics Group, Research I.T.,
Glaxo Wellcome Research & Development,
Gunnels Wood Road, Stevenage, U.K.


Daylight SMARTS strings provide a concise textual representation for specifying a particular pattern or substructure of a molecule. Much like regular expressions denote subsequences or substrings within a sequence, SMARTS patterns define subgraphs of a molecular graph. However unlike SMILES strings which denote molecule objects[1], SMARTS patterns are little more than string parameters to the substructure matching functions of the Daylight toolkit. This talk presents some of the manipulations and computations that may be performed purely on SMARTS patterns, including canonicalisation[2], optimisation and determining whether one SMARTS is a subpattern of another[3]. Such transformations are presented as semantics preserving rewrite rules in a SMARTS pattern algebra. One application of these transformations is the pre-processing of atom types for molecular mechanics force fields[4,5].

Presentation Slides


  1. David Weininger, "SMILES: a Chemical Language and Information System: 1 Introduction to Methodology and Encoding Rules", Journal of Chemical Information and Computer Science (JCICS), Vol. 29, No. 2, pp. 97-101, 1989.
  2. D. Weininger, A. Weininger and J.L. Weininger, "SMILES 2: Algorithm for Generation of Unique SMILES Notation", Journal of Chemical Information and Computer Science (JCICS), Vol. 28, pp. 31-36, 1988.
  3. J.R. Ullman, "An Algorithm for Subgraph Isomorphism", Journal of the Association of Computing Machinery (JACM), Vol. 23, pp. 31-42, 1976.
  4. T.J. O'Donnell, Shashidar N. Rao, Konrad Koehler, Yvonne C. Martin and Beverley Eccles, "A General Approach for Atom-Type Assignment and the Inter-conversion of Molecular Structure Files", Journal of Computational Chemistry, Vol. 12, No. 2, pp. 209-214, 1991.
  5. Bruce L. Bush and Robert P. Sheridan, "PATTY: A Programmable Atom Typer and Language for Automatic Classification of Atoms in Molecular Databases", Journal of Chemical Information and Computer Science (JCICS), Vol. 33, pp. 756-762, 1993.