Exploring the Utility of Shape Descriptors for Diversity Analysis of Combinatorial Libraries

Susan M Boyd* and Scott D Kahn #

* Molecular Simulations , 240-250 The Quorum, Barnwell Road, Cambridge, CB5 8RE, England, UK

# Molecular Simulations Inc., 9685 Scranton Road, San Diego, CA 92121-3752, USA

Library design may be considered as the initial step in a combinatorial chemistry experiment. Often, the huge number of compounds which could theoretically comprise the library is too large to be practically synthesisable. The use of library subsetting may be of value to decrease the size of the library to a more manageable number of compounds. One approach to this technique is to subset the library on the basis of a diversity function, thereby selecting a diverse subset of molecules to be synthesised.

In order to describe the diversity of a dataset, a number of molecular descriptors can be calculated. The use of 1D descriptors, and 2D topological descriptors is widespread. Recently, the use of 3D descriptors has been incorporated into some studies, however, the additional benefits of these 3D descriptors have yet to be quantified.

This study uses a dataset of around 70 compounds, from 14 different activity classes. An ideal diversity selection method should select one molecule from each activity class. Diversity selection was conducted on the dataset on the basis of various combinations of descriptors to determine the impact of the inclusion of 3D descriptors on the selection of diverse molecules. Selections were assessed on their coverage of the compound activity classes within the dataset.