Daylight Chemical Information Systems, Inc.
GENSMI: Generation of Genuine SMILES
Michael A. Kappler*
Daylight Chemical Information Systems, Inc., Santa Fe, New Mexico, 87501
Tharun Kumar Allu and Tudor I. Oprea
Department of Computer Science and Office of Biocomputing
University of New Mexico School of Medicine, Albuquerque, New Mexico, 87131
Abstract. Graph enumeration has been studied for years and continues to be an active area of research. This paper describes an algorithm for generating labeled graphs (molecular structures) as SMILES strings and discusses the results. The SMILES are genuine in the sense that they are produced using the software founded on the language (Daylight SMILES Toolkit). Use of symmetry is crucial for avoiding isomorphic graphs and significantly enhances the rate at which unique SMILES are produced. Constraints are used to direct results towards chemically sensible compounds and "drug-like" structures. Structures are evaluated using the "Simple Metric of Molecular Complexity" (Allu and Oprea, CUP IV), and is a preliminary step towards considering the "ease of synthesis" of the resulting structures. This algorithm is amenable to parallelization and may be useful in the "Screensaver Lifesaver" project (Richards, EuroMUG '03).