The choice of the number of neighbours to skip needs more investigation. In this case we arbitrarily took 100 neighbours, and for most of the examples the mean or standard deviation does not alter much whether 0 or 900 nearest neighbours are skipped.

The data for mean (+) and standard deviation (x) for the ten drugs used in this test versus wdi971demo skipping various numbers of nearest neighbours is shown below.

In the case of caffeine however there is a noticable change in both the mean and standard deviation, caffeine is shown in yellow in the above plot, but the data are shown below for clarity.

It is presumably not unrelated to the fact that caffeine has significant neighbours below 0.4 on the Tanimoto scale.