MUG '05 -- 19th Daylight User Group Meeting -- 9-11 Mar 2005

Chemical information models

Dave Weininger
Daylight/Metaphorics

ABSTRACT

Information modeling is at the heart of informatics.
The molecular information model is cool.
Discuss problems/solutions.

MUG '05 : Chemical information models
How can one do anything useful with a computer?

One usually needs to represent real things in a computer program using digital representations.


MUG '05 : Chemical information models : Fundamentals : 1 of 8

What chemical entities are useful to represent?


MUG '05 : Chemical information models : Fundamentals : 2 of 8

What operations on chemical entities are useful?


MUG '05 : Chemical information models : Fundamentals : 3 of 8

How can we represent chemical entities?

Various representations have different advantages

For instance:

MUG '05 : Chemical information models : Fundamentals : 4 of 8

How can we associate properties with chemical entities?

  1. Direct properties of molecular structure(s)

    [These] are the [property] values of [these molecular structures].

    Simplest possible chemical information model: molecular structure is the identifier, properties are connected to it. Perfect for the clean, idealistic world of non-overlapping molecular chemistry. Very powerful but not comprehensive and not very useful IRL.

    Example: Primary tables in CRC Handbook, Chemist's Companion, etc.

  2. Direct properties of heirarchical molecular structure(s)
    [These] are the [property] values of [these specific kinds of] [generic molecular structures].
    Entities are identified by molecular structure level-of-detail heirarchy. Properties are connected to entities at the appropriate level. Clean and idealistic yet more powerful and more useful than above.

    Example: MedChem MASTERFILE

  3. Properties of arbitrary molecular identifier
    The [entity with this registration number] has [this molecular structure] and has [these] [property] values.
    Common model for traditional chemical registries. All possible molecular entities are represented by registration numbers; properties are assigned to these entities. Requires "god-like" (omniscient) structure identification and discrimination methods ... which IRL become unstable over time when used by normal human beings. Other problems include poor behavior with incomplete structural knowledge and this requires development of a religious "group or split" dogma. OK for closed, static, short-term delivery of homogenous data.

    Examples: CAS, MACCS, WDI, some registration systems

  4. Properties of arbitrary molecular set identifier
    The [entity with this registration number] contains [these molecular structures] and has [these] [property] values.

    Similar to above systems, but for multiplicity of molecules. The problem with god-like systems is even worse than for discrete entities.

    Example: Most USPTO Patents (with legal caveat)

  5. Property of arbitrary identifier
    The [entity with this registration number] has [these] [property] values.
    [These molecular identifiers] are associated with [this entity].

    Entities and property-associations are not necessarily molecular. There is usually no requirement for uniqueness.

    Example: ?

  6. Property of arbitrary set identifier
    The [entity with this registration number] contains [these molecular structures] and has [these] [property] values.

    God-like systems.

    Example: FDA Orange Book (NDAs)

  7. Arbitrary non-molecular super-identifier
    Spresi, Orange
    Examples: QSAR, Wombat
  8. Property of arbitrary non-molecular set identifier
    Spresi, Orange
  9. Associative identifier
    TCM
  10. Normal RDB model
  11. Special ORDBMS model

Implementation is a detail ... an important detail ... but still a detail


MUG '05 : Chemical information models : Examples : 1 of 5

"Direct molecular property",


MUG '05 : Chemical information models : Examples : 1 of 5

"Heirarchical molecular property", e.g., Pomona Medchem MASTERFILE


MUG '05 : Chemical information models : Examples : 1 of 5

"Arbitrary, non-molecular super-identifier", e.g., SPRESI


MUG '05 : Chemical information models : Examples : 1 of 5

"Arbitrary, molecular super-identifier", e.g., WDI


MUG '05 : Chemical information models : Examples : 1 of 5

"Associative identifier", e.g., TCM


MUG '05 : Chemical information models : Examples : 1 of 5

Derived identifier


MUG '05 : Chemical information models : Examples : 1 of 5

Free identifier

"Molecules are not special", e.g., MSDS



-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-


Daylight Chemical Information Systems, Inc.
info@daylight.com