MUG '97 -- 11th Annual Daylight User Group Meeting -- 26 February 1997
Database building from text
GlaxoWellcome, Stevenage, Herts SG1 2NY, UK
- The database was built from "Handbook of Enzyme Inhibitors" H Zollner, VCH 1993
- Use EC# which is a true classifier, like Dewey decimal system for books.
- Sources such as SWISSPROT and Brookhaven to find enzymes for which there is a 3D structure.
- Abstract Brookhaven code.
- Experimented with storing gifs of binding site.
- Manually lookup information on these EC# and create trees which are rooted in $INH<>
- Load into thor which merges all the trees correctly.
- Use nam2smi (contributed) to create trees $SMI<>$INH<>| and merge in thor.
- Caveat. (Posh name for bug). Pre 4.5 thor does not merge on all ambiguous names. If there is a many $SMI to one $INH relationship need the merge to add the subtree to all new $SMI roots.
- The resulting roots are now
- $SMI<> with a real structure
- $INH<> because
- No structure defined
- Generic structure e.g. FATTY ACIDS
- Can' t find name
- Other sources can be prohibitively expensive or impossible to use.
- Many on line facilities such as SciFinder are not geared to handle lists.
- Even if they did it costs $5 per connection table
- Could add reaction/transform too as data about the enzyme.
- Virtual databases
- nam2smi would work better
- Effectively merge in-house data in a maintainable fashion.
- Information Extraction tools for building databases from literature sources.
- Large preferably public database of chemical synonyms
Daylight Chemical Information Systems, Inc.