MUG '03 -- 25 - 28 Feb, 2003

Using Descriptor Counts in Clustering

Robin Hewitt


Although standard chemical clustering methods ignore descriptor-count information, when binary descriptors (such as Daylight fingerprints) are used, this information can be leveraged to avoid comparing every possible pair of compounds during clustering. This talk presents two new algorithms that make use of descriptor-count information for clustering. Validation tests on industry datasets demonstrate that by combining these algorithms, far fewer than n^2 compound-to-compound comparisons were required to achieve good clustering. The larger the dataset, the greater the reduction.

Daylight Chemical Information Systems, Inc.