127 Science Park, Milton Road, Cambridge CB4 0GD, UK
The daylight fingerprint toolkit and Tanimoto similarity metric offer a mechanism for calculating the similarity between a pair of compounds. We use the similarity metric to produce an unbiased spatial similarity plot for a set of N compounds.
Using the similarity metric we define an N dimensional space in which each spanning direction is a single compound from the input set. Principle Component Analysis (PCA) allows us to project this high dimension space onto a two or three dimensional plot for visualization. Use of the Expectation Maximization PCA algorithm allows us to apply this technique to compound sets containing tens of thousands of compounds.
We use these plots to observe the behavior of compound selection algorithms that balance diversity against predicted quality assessed across a broad range of properties. This forms a component of Inpharmatica's drug discovery prioritization platform.