Using Descriptor Counts in Clustering
- Use descriptor-count information to avoid comparing every possible pair of
compounds during clustering.
- Clustering approach is agglomerative.
- First phase: form "small," tight clusters.
- BOOST algorithm.
- Completes in O(n) time.
- Substantially reduces dataset size.
- Second phase: cluster the reduced dataset.
- DiET algorithm.
- Complexity depends on dataset.
- Merge small clusters and single compounds to complete the clustering.