Using Descriptor Counts in Clustering


  • Use descriptor-count information to avoid comparing every possible pair of compounds during clustering.

  • Clustering approach is agglomerative.

  • First phase: form "small," tight clusters.
    • BOOST algorithm.
    • Completes in O(n) time.
    • Substantially reduces dataset size.

  • Second phase: cluster the reduced dataset.
    • DiET algorithm.
    • Complexity depends on dataset.
    • Merge small clusters and single compounds to complete the clustering.

| Prev | Contents | Next | Robin Hewitt (, Feb 2003