MUG '05
-- 19th Daylight User Group Meeting -- 9-11 Mar 2005
Chemical information models
Dave Weininger
Daylight/Metaphorics
Abstract
Information modeling is at the heart of informatics.
The molecular information model is cool.
Discuss problems/solutions.
How can one do anything useful with a computer?
One usually needs to represent real things
in a computer program using digital representations.
- Abstract things are easiest, e.g., numbers.
- Even simple numbers are non-trivial (odd ones too ;-)
Remember PL/1, with a dozen ways to represent an number?
- IBM made a fortune using numbers to represent money.
Fundamental chemical informatics: 1 of 8
What chemical entities are useful to represent?
- Molecules (valence, LCAO, ab initio/quantum models)
- Reactions (same as molecular models, also dynamics)
- Substances (not necessarily molecular models)
- Mixtures (may or may not be molecular)
- Molecular patterns (theoretical, statistical, legal)
- other ... crystals, large molecules, polymers, alloys, catalysts
Fundamental chemical informatics: 2 of 8
What operations on chemical entities are useful?
- Storage/retrieval Basic input and output. Efficiency
- Identity Are two entities same or different?
- Reactions (same as molecular models, also dynamics)
- Substances (not necessarily molecular models)
- Mixtures (may or may not be molecular)
- Molecular patterns (theoretical, statistical, legal)
- other ... crystals, large molecules, polymers, alloys, catalysts
Various representations have different advantages,
E.g., SMILES are semantically well-defined representations of a
specific valence models for molecules and reactions.
Names are more useful for representing things without a
useful valence model, e.g., "Turpentine" or "Unknown 123".
4. Fundamental chemical informatics: Part THREE
How can we represent chemical entities in a computer?
- Name (IUPAC, index name, common name, etc.)
- Number (e.g., CAS Registry Number
- Picture (e.g., WIMP)
- Properties (e.g., ECN)
- Connection table (e.g., SDF, MOL, MOL2, etc.)
- Linear notation (WLN, SMILES, ROSDAL)
0. How can one do anything useful with a computer?
1. What chemical entities are useful to represent?
2. What operations on chemical entities are useful?
3. How can we represent chemical entities in a computer?
4. How can we associate properties with chemical entities?
5. Fundamental chemical informatics: Part FOUR
How can we represent a chemical entities in a computer?
- Name (IUPAC, index name, common name, etc.)
- Number (e.g., CAS Registry Number
- Picture (e.g., WIMP)
- Connection table (e.g., SDF, MOL, MOL2, etc.)
- Linear notation (WLN, SMILES, ROSDAL)
Daylight Chemical Information Systems, Inc.
info@daylight.com