Daycart / Oracle Databases

We're converting our vendor databases into Oracle format and making them available. These are being updated on the regular update schedule with the Thor versions of these databases. Others will be made available as we do the conversions. What are available now are:

General Database Structure

Each database has the following common tables:

MAIN table    - The main table, linked to the other tables by
                the primary key REG_NO.  The REG_NO is defined by the vendor:
                for ACD it's the ACD Number, for WDI it's the Derwent 
                External Registry Name, for WOMBAT it's the WOMBAT ID.

DBINFO table  - A table which describes the databases.  This contains
                the creation date, version number, copyright message.

CLUSTER table - The table which contains the Jarvis-Patrick clustering
                information.  Linked with the main table by smi2cansmi(smi, 0).

Each database also has tables which are unique, based on the particular vendors content:

WDI041
-----------------------------------------------------------------------------

WDI_CAS                - CAS Registry number; linked to main table 
                         by reg_no (32483)
WDI_CLUSTER            - Jarpat cluster data (55700)
WDI_COMB_PREP          - Combination/prep; linked to main table by reg_no (34659)
WDI_COMMENT            - Comments; linked to main table by reg_no (28150)
WDI_KEYWORD            - Keywords; linked to main table by reg_no (721960)
WDI_KEYWORD_DEF        - Definition of Keyword codes; linked to wdi_keyword 
                         table by keyword_cd
WDI_MAIN               - Main table (75502)
WDI_NAME               - Names; linked to main table by reg_no (226102)
WDI_NAME_DEF           - Definition of Name codes; linked to wdi_name table 
                         by name_cd
WDI_REFERENCE          - References; linked to main table by reg_no (30829)
WDI_TRADE              - Trade names; linked to main table by reg_no (116583)

ACD041
-----------------------------------------------------------------------------
ACD_CAS                - CAS Registry number data (79222)
ACD_CATALOG            - Supplier catalog data  (1516569)
ACD_CLUSTER            - Jarpat cluster data (168415)
ACD_MAIN               - Main table (202768)
ACD_NAME               - Names (1263977)
ACD_SUPPLIER           - Supplier information (659)
ACD_SUPPLIER_ADDRESS   - Supplier contact information (5046)

WOMBAT041
-----------------------------------------------------------------------------
WOMBAT_ACTIVITY        - Activity table; linked to main table by reg_no and activity_id (143090) 
WOMBAT_ACT_PROTOCOL    - Activity protocol table; linked to wombat_activity by pro_id (8029) 
WOMBAT_CLUSTER         - Jarpat cluster data (61498)
WOMBAT_GNAM            - Generic name table; linked to main table by reg_no (6990)
WOMBAT_KEYWORD         - Keyword table; linked to main table by reg_no (90605)
WOMBAT_MAIN            - Main table (79165)
WOMBAT_PROP            - Properties table linked to main table by reg_no (79165)
WOMBAT_REFERENCE       - Reference table; linked to main table by reg_no (3039)

SPRESI95
-----------------------------------------------------------------------------
SPRESI_BOILINGPNT       -  Boiling Point table; linked to main table by reg_no (182697)
SPRESI_CLUSTER          - Jarvis-Patrick clustering results (2349778)
SPRESI_DECOMPOSITION    - Decomposition table; linked to main table by reg_no (15278)
SPRESI_DENSITY          - Density table; linked to main table by reg_no (26456)
SPRESI_DISSOCIATION     - Dissociation table; linked to main table by reg_no(2606)
SPRESI_JOURNAL          - Journal data table; linked to main table by reg_no (2960290)
SPRESI_KEYWORD          - Keyword table; linked to SPRESI_PATENT and SPRESI_JOURNAL
                          tables by reg_no and document_id (8391060)
SPRESI_MAIN             - Main data table; one record per reg_no (3177717)
SPRESI_MELTINGPNT       - Melting Point table; linked to main table by reg_no (961436)
SPRESI_MUTAROTATION     - Mutarotation table; linked to main table by reg_no(6)
SPRESI_PATENT           - Patent table; linked to main table by reg_no (1020361)
SPRESI_REFRACTIVE_INDEX - Refractive Index table; linked to main table by reg_no (70546)
SPRESI_ROTATION         - Optical Rotation table; linked to main table by reg_no (64049)
SPRESI_SUBLIMATION      - Sublimation table; linked to main table by reg_no (2577)

Database Structure - ACD Database Organization

Database Structure - WOMBAT Database Organization

Database Structure - WDI Database Organization

Database Structure - SPRESI Database Organization



In the WDI Database WDI_MAIN is linked to WDI_CAS, WDI_NAME, WDI_KEYWORD, WDI_TRADE, WDI_COMB_PREP, WDI_COMMENT, and WDI_REFERENCE by the unique identifier REG_NO. WDI_KEYWORD_DEF is linked to WDI_KEYWORD by the identifier keyword_cd and WDI_NAME_DEF is linked to WDI_NAME by the identifier name_cd. WDI_MAIN is linked to WDI_CLUSTER by using the smi2cansmi function on the column SMI.   


Comparison between WDI datatypes in THOR and ORACLE

In WDI the following THOR datatypes correspond to the following columns.

ORACLE

SQL> select * from wdi_main where preferred_name='CAFFEINE';

REG_NO          ISOMER                           PREFERRED_NAME
--------------  -------------------------------  ------------------
CAFFEINE        Cn1cnc2n(C)c(=O)n(C)c(=O)c12     CAFFEINE

D2D_COORDS
--------------------------------------------------------------------------------------------------------
1.48,1.35,1.22,0.57,1.70,-0.11,1.22,-0.77,0.44,-0.51,-0.28,-0.93,-0.28,-1.75,-0.99,-0.51,-1.71,-0.93....

AMW        MOLWEIGHT       UPDCODE          MOL_FORM
---------  --------------  ---------------  -------------------
194.192    194.192         1990.1-2001.4    C8H10N4O2

Thor

> thorlookup -INCLUDE_DATATYPES "DMF DPN AMW 2D DYQ" wdi041%@laluz:thor:thor%
Cn1cnc2n(C)c(=O)n(C)c(=O)c12
$SMI<Cn1cnc2n(C)c(=O)n(C)c(=O)c12>
$DXRN<CAFFEINE>
DPN<CAFFEINE>
2D<1.48,1.35,1.22,0.57,1.70,-0.11,1.22,-0.77,0.44,-0.51,-0.28,-0.93,-0.28,-1.75,-0.99,-0.51,-1.71,-0.93..;>
DMF<C8 H10 N4 O2>
DYQ<1990.1-2001.4>
AMW<194.192>
|

$DXRN is the Derwent external registry name which is the primary key REG_NO in the WDI_TABLES. Otherwise, the correspondance of columns to Thor datatypes is pretty clear. One exception is the molecular weight; in some cases the vendor supplies a molecular weight with their record; these are kept as a column in the MAIN table. We populate the AMW column with the result from smi2amw(isomer).

The other change worth understanding is how we've mapped names and keywords into the Oracle tables. For searching efficiency we've used tables with multiple type codes for wdi_name and wdi_keyword. The keyword types are described in the wdi_keyword_def table, and the name types are described in the wdi_name_def table.

SQL> select * from wdi_keyword_def;

KEY DEFINITION
--- ----------------------------------------
PT  ACTIVITY CLASS
SS  SUBSTRUCTURE KEYWORDS
IU  INDICATIONS AND USAGE
PW  PRECAUTIONS AND WARNINGS
CI  CONTRA-INDICATIONS
IA  INTERACTIONS
AE  ADVERSE EFFECTS
MA  MECHANISM OF ACTION

SQL> select * from wdi_name_def;

NAME_CD  DEFINITION
-------  ----------------------------------------
RN       DERWENT REGISTRY NAME
INN      INTERNATIONAL NON_PROPRIETARY NAME
USAN     UNITED STATED ADOPTED NAME
APP      APPROVED NAMES
OTHER    SYNONYMS OF UNKNOWN STATUS
TP       MANUFACTURERS CODE NUMBERS

ORACLE

SQL> select * from wdi_keyword where reg_no='CAFFEINE';

REG_NO            KEYWORD_CD   TEXT
----------------  -----------  ----------------------
CAFFEINE          SS           IMIDE,CYCLIC
CAFFEINE          SS           UREA,CYCLIC
CAFFEINE          SS           PURINE
CAFFEINE          IU           MISC: analgesia
CAFFEINE          IU           MISC: drowsiness
CAFFEINE          PW           hypertension
...

Thor

> thorlookup -INCLUDE_DATATYPES "SSK IU PW" wdi041%@laluz:thor:thor%
Cn1cnc2n(C)c(=O)n(C)c(=O)c12
$SMI<Cn1cnc2n(C)c(=O)n(C)c(=O)c12>
$DXRN<CAFFEINE>
SSK<IMIDE,CYCLIC; UREA,CYCLIC; PURINE>
IU<MISC: analgesia;MISC: drowsiness; ...>
PW<hypertension; ... >
...
|