Using Descriptor Counts in Clustering

DiET (Directed Exploration of Territories)

  • Does a complete clustering.


  • Modified Wards method.


  • Can use BOOST's output as its start point.
Standard Wards Method
  1. Compare each item in a dataset to every other item.


  2. Choose the "best" pairing.*


  3. New pair now becomes an item, replacing the two that went into it.


  4. Keep pairing items until done.
    *"best" = least increase in summed moments of inertia.

Theory: As in BOOST, we bin the (reduced) dataset by holding the most variable bits constant. In DiET, however, we don't have a tree, just one level of bins.

For each item in the reduced dataset, its best partner might be in any bin. But it's more likely to be in a bin that differs by just one or two bits than in a bin that differs by, say, ten.

As we combine pairs to build clusters, we start with restricted search spaces, centered on each bin, then widen them in stages. First, we look just within the same bin. As pairings start to lose compactness, we expand our search territories to include bins that differ by one bit. Then we expand to include two-bit differences, and so on. As the number of bins grows, the number of items in each bin shrinks.


| Prev | Contents | Next | Robin Hewitt (rhewitt@acm.org), Feb 2003