Daylight Clustering Methods
Daylight offers four clustering methods: scaffold-directed clustering and three fingerprint similarity methods: Jarvis-Patrick, k-modes, and sphere exclusion).
Saffold-directed clustering is based on internally generated fingerprints that contain topological information. Clustering is directed on the basis of the coverage of a cluster's common substructure or scaffold in a manner that guarantees that the final clusters maintain a minimal coverage over all their member molecules. There are no pre-determined number of clusters that will result and molecules that can not meet the minimal coverage requirement will be singletons. The guarantee of a common substructure or scaffold comes at a price in terms of computational resources and the method is best limited to ~10,000 structures. The scaffold directed method is especially useful in the analysis of lead optimization projects, target-directed vendor libraries, and structures provided in publications or patents.
The other three clustering methods are well-validated, non-parametric methods of clustering suitable for a wide variety of applications, including database characterization, analog discovery, and directed or diversity-based structure selection for binding assays, toxicity tests and other screens. This approach can save a great deal of time and money compared to screening based on random or manual compound selection.
These three methods are based on the value of the similarity measure between binary descriptors of pairs of molecules. Considerable flexibility is provided with respect to similarity coefficients. Fifteen commonly used named measures (e.g., Tanimoto) are available as well as user-defined expressions of similarity. Commonly, the binary descriptors are standard Daylight structural fingerprints used in sub- and super-structure searching. However any binary descriptor may be used, e.g., activity against a set of biological screens. Tools are provided to convert the descriptor into the appropriate format.