Archives routinely used in drug discovery consist of millions of tens of millions of compounds, including many thousands of "singleton" compounds. It can be difficult to retroactively classify these compounds into chemical series with related core structures. Classification methods which rely on similarity calculations are prone to be distracted by substituents about the core structure. In other words compounds with different cores but similar substituents may be grouped together. Limiting the search to a specific core means it is necessary to know the core structure(s) before the search. We would like to be able to automatically classify compounds into series, and use the series specific information to build more reliable models. We would also like to use that information to suggest compounds to synthesize in the future.
This talk will discuss Shard, a library for breaking molecules into pieces and re-assembling them to form new molecules, and its use in automated chemical series identification and vitual library generation. Other uses for this library which will be discussed include 2.5D (more than 2-dimensional, less than 3-dimensional) modeling with a detailed analysis of the results, and a variety of tools designed to elucidate compound relationships via pairwise structure-activity comparisons.