MUG '01 -- 15th Daylight User Group Meeting -- 6 - 9 Mar 2001

Writing a 4GL for Combinatorial Chemistry: A Work In Progress

Robin Hewitt
DuPont Pharmaceuticals Research Labs


Can a 4GL compete with stand-alone programs for designing and analyzing combinatorial libraries? Unlike 3GLs such as C++ or Fortran, a 4GL (4th Generation Language) allows users to create a high-level description of what's wanted without specifying details for how to do it. Typically, 4GLs use a directed-graph paradigm in which nodes are operators, and edges represent data flow. There's a tradeoff: with a 4GL one gains ease-of-use but loses the performance and flexibility of a 3GL. We're writing a modular 4GL for combinatorial chemistry computation that uses an XML representation of molecule lists for the data stream. Our operators are simple, stand-alone utilities that read and write this XML stream. Currently, our operators include Filter (filter lists by structure or property), React (combinatorial application of a transform to all input lists), and Align2D (align molecules in a list to a substructure). These few operators can be easily linked, allowing users to interactively enumerate and design combinatorial libraries. We've measured performance costs of this approach. Despite the verbose nature of XML, we found the performance overhead to be surprisingly low. The benefits of this 4GL include ease of use, flexibility of use, extensibility, and maintainability.

Presentation slides

Daylight Chemical Information Systems, Inc.