Daylight Summer School

Lab: DayCart extensions to SQL -- WITH HINTS


This set of labs guides the user through the use of Daycart functionality in SQL and PL/SQL.


The first operation will be to load the demo tables into your local Oracle instance. The data is available from the ftp server. The filenames are: tcm01.dmp, demo.sql, acd_demo.sql, wdi_demo.sql, and wombat_demo.sql.

From Import files:

The file "tcm01.dmp" is an Oracle dump file. It was generated using the Oracle "export" program, a utility which allows one to dump data from an Oracle database into a more portable binary format. The "import" utility will load this binary data into the database.

$ imp mug/coffee
Import: Release - Production on Mon May 24 15:52:51 2004

(c) Copyright 1999 Oracle Corporation.  All rights reserved.

Connected to: Oracle8i Enterprise Edition Release - Production
With the Partitioning and Java options
PL/SQL Release - Production

Import file: expdat.dmp > tcm01.dmp

Enter insert buffer size (minimum is 8192) 30720> <CR>

Export file created by EXPORT:V08.01.05 via conventional path

Warning: the objects were exported by TCM, not by you

import done in US7ASCII character set and US7ASCII NCHAR character set
List contents of import file only (yes/no): no > <CR>

Ignore create error due to object existence (yes/no): no > <CR>

Import grants (yes/no): yes > <CR>

Import table data (yes/no): yes > <CR>

Import entire export file (yes/no): no > yes

. importing TCM's objects into MUG
. . importing table                    "TCM01_MOL"       6780 rows imported
. . importing table                  "TCM01_PLANT"       1541 rows imported
. . importing table              "TCM01_PLANT_MOL"      12102 rows imported
. . importing table                    "TCM01_REF"        656 rows imported
. . importing table                "TCM01_MOL_REF"       7917 rows imported
Import terminated successfully without warnings.

From SQL Scripts:

Regular SQL scripts can be executed directly from within the SQLPlus environment:

$ sqlplus mug/coffee
SQL> @demo

...  (lots of output)

SQL> @acd_demo

...  (lots of output)

SQL> @wdi_demo

...  (lots of output)

SQL> @wombat_demo

...  (lots of output)

The main table of interest is: demo. It has chemical structures and ID numbers. The other tables are subsets of the ACD, WDI, and Wombat databases.

You can find out the names of the tables which are owned by the current user with a SQL select statement (from the data dictionary):

SQL>  select table_name from user_tables;

You can get information about the columns in the tables with the 'describe' command. For example:

SQL>  desc demo

Simple SQL Operators

  1. Find Dopamine (NCCc1ccc(O)c(O)c1) in the "demo" table.

    The SQL operator used for an exact match search is:

    operator exact (a IN VARCHAR2_OR_CLOB, b IN VARCHAR2_OR_CLOB) => NUMBER

    The SQL operator used for a absolute-SMILES match match search is:

    operator asmiles (smiles IN VARCHAR2_OR_CLOB, query IN VARCHAR2_OR_CLOB) => NUMBER

    The exact and role operators expect the query to be canonicalized in the same fashion as the column being searched. Since the demo table has absolute canonical SMILES, then the query must include the absolute canonical SMILES.

  2. Find structures similar to Dopamine in the "demo" table. Impose a tanimoto cutoff of 0.7. Alternately, find the ten most similar structures to Dopamine in the "demo" table.

    The operators used for a tanimoto search is:

    operator tanimoto (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
                       fp_or_smi2 IN VARCHAR2_OR_CLOB) => NUMBER
    operator tanimoto (fp_or_smi1 IN VARCHAR2_OR_CLOB,
                       fp_or_smi2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER

  3. Count the number of compounds in the demo table which have thiophenol (Sc1ccccc1) as a substructure

    The function used for a substructure search is:

    operator contains (smiles1 IN VARCHAR2_OR_CLOB,
                       smiles2 IN VARCHAR2_OR_CLOB) => NUMBER

  4. Perform a SMARTS query on the demo table.

    The function used for a SMARTS query is:

    operator matches (smiles IN VARCHAR2_OR_CLOB,
                      smarts IN VARCHAR2_OR_CLOB) => NUMBER

PL/SQL Functions

  1. Write a simple PL/SQL routine to compute the number of atoms and rings in naphthalene (c1cccc2c1cccc2).

    Use the ddprop package functions, not the SQL operators, within the PL/SQL block.

    Use the dbms_output package with "set serveroutput on" to write simple results to the screen.

  2. Create a PL/SQL function which implements a new (trivial) molecular property, the ring ratio, here defined as the fraction of atoms in a molecule which are in a ring.

    Use the match_count() function.

Oracle Data Manipulation

  1. Create a demo table as a subset of tcm01_mol. Only include the first 100 records from tcm01_mol.

    Use a variation the "CREATE TABLE" command ("create table ... as select ... ") and use the "rownum" pseudocolumn to limit the rows.

  2. Add a column for average molecular weight. Populate the new table with calculated values.

    Use the "ALTER TABLE" command to add the column, then use the Daycart function to compute the average molecular weight and update the new column.

  3. Validate the calculated molecular weight (new_mw) against the stored molecular weight (mw). Look for a discrepancy greater than 1.0.

    The SQL function "ABS()" returns the absolute value of a number.

  4. Add columns for rotatable bonds, hydrogen donors, and ClogP and populate the table.

Multi-Table Queries

  1. Find out how many structures are common between the demo table and the tcm01_mol table.

    Since the SMILES in the two tables have been canonicalized and are stored as simple VARCHAR2 data, a simple join query will work.

  2. Refine the query and find all structures in both demo and tcm01_mol that contain the phenyl ethylamine (NCCc1ccccc1) substructure.

    Extend the previous answer with the contains() query or use an INTERSECT statement to join two SQL queries.

Daylight Chemical Information Systems Inc.