This set of labs guides the user through the use of Daycart functionality in SQL and PL/SQL.
The first operation will be to load the demo tables into your local Oracle instance. The data is available from the ftp server. The filenames are: tcm01.dmp, demo.sql, acd_demo.sql, wdi_demo.sql, and wombat_demo.sql.
From Import files:
The file "tcm01.dmp" is an Oracle dump file. It was generated using the Oracle "export" program, a utility which allows one to dump data from an Oracle database into a more portable binary format. The "import" utility will load this binary data into the database.
$ imp mug/coffee Import: Release 8.1.5.0.0 - Production on Mon May 24 15:52:51 2004 (c) Copyright 1999 Oracle Corporation. All rights reserved. Connected to: Oracle8i Enterprise Edition Release 8.1.5.0.0 - Production With the Partitioning and Java options PL/SQL Release 8.1.5.0.0 - Production Import file: expdat.dmp > tcm01.dmp Enter insert buffer size (minimum is 8192) 30720> <CR> Export file created by EXPORT:V08.01.05 via conventional path Warning: the objects were exported by TCM, not by you import done in US7ASCII character set and US7ASCII NCHAR character set List contents of import file only (yes/no): no > <CR> Ignore create error due to object existence (yes/no): no > <CR> Import grants (yes/no): yes > <CR> Import table data (yes/no): yes > <CR> Import entire export file (yes/no): no > yes . importing TCM's objects into MUG . . importing table "TCM01_MOL" 6780 rows imported . . importing table "TCM01_PLANT" 1541 rows imported . . importing table "TCM01_PLANT_MOL" 12102 rows imported . . importing table "TCM01_REF" 656 rows imported . . importing table "TCM01_MOL_REF" 7917 rows imported Import terminated successfully without warnings.
From SQL Scripts:
Regular SQL scripts can be executed directly from within the SQLPlus environment:
$ sqlplus mug/coffee SQL> @demo ... (lots of output) SQL> @acd_demo ... (lots of output) SQL> @wdi_demo ... (lots of output) SQL> @wombat_demo ... (lots of output)
The main table of interest is: demo. It has chemical structures and ID numbers. The other tables are subsets of the ACD, WDI, and Wombat databases.
You can find out the names of the tables which are owned by the current user with a SQL select statement (from the data dictionary):
SQL> select table_name from user_tables;
You can get information about the columns in the tables with the 'describe' command. For example:
SQL> desc demo
The SQL operator used for an exact match search is:
operator exact (a IN VARCHAR2_OR_CLOB, b IN VARCHAR2_OR_CLOB) => NUMBER
The SQL operator used for a absolute-SMILES match match search is:
operator asmiles (smiles IN VARCHAR2_OR_CLOB, query IN VARCHAR2_OR_CLOB) => NUMBER
The exact and role operators expect the query to be canonicalized in the same fashion as the column being searched. Since the demo table has absolute canonical SMILES, then the query must include the absolute canonical SMILES.
There are several options here. Each will use a different index
to accomplish the same operation:
This uses the "ddexact" index. Since the exact query expects that the
strings being compared are canonicalized at the same level (either unique
or absolute SMILES), the query for Dopamine is canonicalized during the
call.
This uses the "ddgraph" index. The query and table data are normalized
at the absolute SMILES level as part of the comparision.
This uses the "ddrole" index. Like the exact query, the component search
assumes that the strings being compared are canonicalized at the same level
(either unique or absolute SMILES). Unlike the previous two solutions, this
will find cases where Dopamine is one component of a dot-disconnected
SMILES (eg. "NCCc1ccc(O)c(O)c1.NCCc1ccccc1").
select * from demo where exact(smi, smi2cansmi('NCCc1ccc(O)c(O)c1', 1)) = 1;
select * from demo where asmiles(smi, 'NCCc1ccc(O)c(O)c1') = 1;
select * from demo where component(smi, smi2cansmi('NCCc1ccc(O)c(O)c1', 1)) = 1;
The operators used for a tanimoto search is:
operator tanimoto (fp_or_smi1 IN VARCHAR2_OR_CLOB, fp_or_smi2 IN VARCHAR2_OR_CLOB) => NUMBER operator tanimoto (fp_or_smi1 IN VARCHAR2_OR_CLOB, fp_or_smi2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER
The first SQL command is:
select smi, tanimoto (smi, 'NCCc1ccc(O)c(O)c1') from demo where tanimoto(smi, 'NCCc1ccc(O)c(O)c1') > 0.7 order by tanimoto(smi, 'NCCc1ccc(O)c(O)c1') desc;
Note that this query calls the tanimoto() function three times. The tanimoto() function in the where clause is what's used to actually perform the selection; the where clause eliminates rows which don't meet the criteria of a similarity greater than 0.7. The tanimoto() function in the select computes the simliarity value for the output of the statement, and the tanimoto() in the order-by clause is used to sort the output in descending order.
The second option, finding the ten most similar structures, uses the optional third parameter to the tanimoto() function.
select smi, tanimoto (smi, 'NCCc1ccc(O)c(O)c1') from demo where tanimoto(smi, 'NCCc1ccc(O)c(O)c1', 10) > 0.0 order by tanimoto(smi, 'NCCc1ccc(O)c(O)c1') desc;
The function used for a substructure search is:
operator contains (smiles1 IN VARCHAR2_OR_CLOB,
smiles2 IN VARCHAR2_OR_CLOB) => NUMBER
The SQL command is:
select count(1) from demo where contains(smi,'Sc1ccccc1') = 1;
The function used for a SMARTS query is:
This query finds ortho-substitued benzaldehyde structures and returns the
SMILES strings.
operator matches (smiles IN VARCHAR2_OR_CLOB,
smarts IN VARCHAR2_OR_CLOB) => NUMBER
select smi from demo where matches(smi,'[N,O]ccC=O') = 1;
Use the ddprop package functions, not the SQL operators, within the PL/SQL
block.
Use the dbms_output package with "set serveroutput on" to write simple results
to the screen.
set serveroutput on
declare
smi varchar2(20);
natoms number;
nrings number;
begin
smi := 'c1cccc2c1cccc2';
natoms := c$dcischem.ddprop.fatom_count(smi);
nrings := c$dcischem.ddprop.fring_count(smi);
dbms_output.put_line('Number of atoms in naphthalene is: ' || natoms);
dbms_output.put_line('Number of rings in naphthalene is: ' || nrings);
end;
/
Use the match_count() function.
Then try it with:
create function ring_ratio(smi in varchar2) return number
is
natoms number;
nringatoms number;
begin
natoms := c$dcischem.ddprop.fatom_count(smi);
nringatoms := c$dcischem.ddprop.fmatch_count(smi, '[*r]');
return nringatoms/natoms;
end;
/
select ring_ratio('c1ccccc1') from dual;
select ring_ratio('CCCCCCc1ccccc1') from dual;
Use a variation the "CREATE TABLE" command ("create table ... as select ... ") and use the "rownum" pseudocolumn to limit the rows.
create table tcm_subset as
select * from tcm01_mol
where rownum <= 100;
Use the "ALTER TABLE" command to add the column, then use the
Daycart function to compute the average molecular weight and update the new
column.
To populate the columns:
alter table tcm_subset add new_mw number;
desc tcm_subset;
update tcm_subset
set new_mw=smi2amw(smi);
The SQL function "ABS()" returns the absolute value of a number.
Also note that we could do the validation by calculating the molecular weight
on the fly, without using a new column:
Note that the "mw" column in the table is actually a string (VARCHAR2). The
SQL engine performs an implicit conversion to a number for the comparison.
The same result is obtained if the explicit conversion, using the to_number()
function, is used:
select smi, mw, new_mw from tcm_subset
where abs(mw - new_mw) > 1.0;
select smi, mw, smi2amw(smi) from tcm_subset
where abs(mw - smi2amw(smi)) > 1.0;
select smi, mw, new_mw from tcm_subset
where abs(to_number(mw) - new_mw) > 1.0;
For clogp(), use the contrib script to create the fclogp() function and
clogp() operator. Then:
alter table tcm_subset add (rot_bond number, hdonor number, clogp number);
update tcm_subset set
rot_bond=rotbond_count(smi),
hdonor=hdonor_count(smi),
clogp=clogp(smi);
Since the SMILES in the two tables have been canonicalized and are stored as
simple VARCHAR2 data, a simple join query will work.
select count(1) from tcm01_mol, demo
where tcm01_mol.smi = demo.smi;
Extend the previous answer with the contains() query or use an INTERSECT
statement to join two SQL queries.
There are several ways to answer this query.
The above query finds all SMILES which match the contains() query, and are
also are common to both lists.
An intersection query will also work.
Note that the two queries above give different answers. In fact, the
INTERSECT set operation removes duplicates, so the second query has less
answers. The query below gives the same answers as the INTERSECT-based
query.
select demo.smi from demo, tcm01_mol
where contains(demo.smi, 'NCCc1ccccc1') = 1
and demo.smi = tcm01_mol.smi;
select smi from demo
where contains(smi, 'NCCc1ccccc1') = 1
intersect
select smi from tcm01_mol
select distinct(demo.smi) from demo, tcm01_mol
where contains(demo.smi, 'NCCc1ccccc1') = 1
and demo.smi = tcm01_mol.smi;