How to Build a Really Big Catalog: The Modgraph Chemical Suppliers Catalog (CSC)

Craig A. James
Modgraph Consultants LTD


Many companies desire a comprehensive catalogue of all chemicals available from all sources. The commercial databases currently available can suffer from incompleteness, inconsistent quality and/or out-of-date data, and can be prohibitively expensive when deployed widely. Many pharmaceutical companies have embarked on in-house efforts to build such a database, but the task is challenging and, even if successful, requires a high level of ongoing support. The combined data from all chemical suppliers represents several millions of structures from many dozens of vendors.

Such a database requires six elements to succeed:

  1. It must be current and comprehensive - all commercially-available compounds of interest from all vendors must be in the database.
  2. Data from all vendors must be stored in a consistent way, e.g. various vendors' tautomers, stereochemistry, salts, etc. must be stored uniformly.
  3. The cheminformatics engine must be very fast to handle the large number of compounds.
  4. It must have a loader capable of handling data in a wide variety of standards from diverse vendors, and capable of merging all catalogues into a single, uniform schema.
  5. It must have a user-friendly front end that serves both expert "power users" and casual users.
  6. It must be maintainable with a modest effort.

We will show how these problems were solved for the Modgraph Chemical Suppliers Catalogue. The technical challenges are noteworthy; additionally, we will present a unique solution to the legal (copyright) problems associated with such a database.

Presentation slides:

