Comparison of commercial databases with glax registry file.

Source # of structures 2-D parent matches in glax Near neighbours in glaxNear neighbours within database
Specs 1897
IBS-NP 3000
MS-9402 960
MS-9502 953
7 Compounds with no structures
MS-9504 960
MS-9505 835
Chembridge 2500
Glax96_2
ComGenex
4969
On_Stock
LaboTest
10608
Orion 5000
Biotechnology Corp Of America 7700
7690 readable
glx9604
Beletskaya Moscow
8084

The parent matches pie plot shows the proportion of compounds which are present in the glax file. To do the comparison, the largest piece is extracted and all the charges which can be reduced to zero by changes in hydrogen count, removed. The matches are true 2D there is no attempt to match isomers. This therefore represents an upper bound of the overlap.
The two neighbour pie charts indicate the proportion of compounds which have a given number of near neighbours based on the DAYLIGHT 1024 sized fingerprint and the Tanimoto similarity measure. In the first plot the neighbours in the glax registry file are counted for each compound from the commercial source. This should give a feel for how this database is filling gaps in the diversity and/or extending the diversity of the registry file. In the second plot the numbers of neighbours within the supplied database are counted. This gives an indication of whether we are being offered a lot of related compounds.
To ease use as this table gets bigger, or the plots are borrowed :-), the 10 and over segment is labelled >10 for the comparison between databases and 10+ for within plots. The within plots are in heavy type too.