Sphere Exclusion first pick randomly a compound which is usally the first compound in the list. Since the compounds in the list in general are not ordered by any specific rules, the next compound in the list might be one of the compounds within the "sphere" of the first compounds or outside.

Also because of the random order the seconde compounds add to the subset might be at the border of the sphere or somewhere else.

Since we want that the coverage of the subset is as evenly as possible, spheres of the selected compounds which overlap are prefered. In addition, it would reduce the total processing time if we can throw out as early as possible compounds.



The blue dot and the red dots are the reference molecules. Since we have ordered the compounds based on the similarity to the reference molecules (first blue, than the one of red ones), the first compounds in the list is the most similar one to the blue reference molecule and the next couple of compounds in the list is most probably in the sphere of the first compound. So, we can throw out couple of compounds right after we have pick up one for subset.


The compounds most similar to the reference molecules will be picked first for the subset because they will come first.







Ten cycles might be already enough.
Run 0 is a separate run and done for comparison reason. The compounds are group by plates, however the plates are ordered randomly.

If we are OK with 20 to 30% reduncy we could select between 100 and 200 plates in this case.

Compare to the selection with the randomly ordered plates, we could increase the number of the selected compounds per plates.