Daylight Chemistry Cartridge, Version 4.62.10f (beta)

DRAFT DOCUMENTATION

Daylight Chemistry Cartridge, Version 4.62.10f (beta)

Jack Delany

DAYLIGHT Chemical Information Systems, Inc. Mission Viejo, CA USA

TABLE OF CONTENTS:

Quick Start.
Installation.
Installation Troubleshooting.
Verification.
De-Installation.
PL/SQL Functions, SQL Operators.
- General Purpose Functions.
- Molecule / Reaction Functions.
- Fingerprint Functions.
- Comparison Functions.
- Program Object Functions.
Extensible Indexes.
Using Program Objects.
Optimizer Support.
Error Codes.
Tuning Hints.

This section gives a quick overview of the installation procedures for the cartridge. Please consult the detailed installation section for more information on each of these items.

Untar the cartridge distribution (tar xvf daycart47101.tar). This creates a directory 'dcischem' which contains the shared object library, and a number of SQL and wrapped SQL scripts (.PLB).

Create the Oracle user 'c$dcischem', identified by 'secret'. This user should have DBA privileges, and reasonable default tablespaces.

Edit the file create.sql. Make sure that the value for the path to 'c$dcischemlib' is correct. Run the script with: 'svrmgrl < create.sql'.

Grant the role 'daycart' to any non-privileged user who requires cartridge functionality.

Revoke privileges from user 'c$dcischem'. Change the password for 'c$dcischem'.

Configure and start the network listener. A listener must be configured to listen for IPC connections using key EXTPROC0. A network Service Name for 'EXTPROC_CONNECTION_DATA' must also be configured.

See the SQL scripts in the 'TEST' subdirectory for examples of use of the cartridge.

Installation:

This section gives detailed installation instructions.

Log into the system as user oracle. Download and unpack the cartridge distribution. The distribution is a compressed tar archive. The preferred location to unpack the distribution is in the Oracle database home directory ($ORACLE_HOME). As the cartridge is version specific, install the cartridge under the version-specific ORACLE_HOME directory.

  $ cd $ORACLE_HOME
  $ gunzip daycart47101.tar.gz    # creates daycart47101.tar
  $ tar xvf daycart47101.tar      # creates dcischem directory
    <...>
  $ cd dcischem

Create the user 'c$dcischem' and grant necessary privileges. The file 'create.sql' also contains these instructions. Log into Oracle with DBA privileges and execute the following commands.

  SQL> create user c$dcischem identified by secret;
  SQL> grant dba to c$dcischem;

Alternately, rather than granting DBA privilege to c$dcischem, you can grant the minimal required privileges:

  SQL> grant create operator to c$dcischem;
  SQL> grant create indextype to c$dcischem;
  SQL> grant create table to c$dcischem;
  SQL> grant create library to c$dcischem;
  SQL> grant create procedure to c$dcischem;
  SQL> grant create role to c$dcischem;
  SQL> grant create type to c$dcischem;
  SQL> grant create public synonym to c$dcischem;

It is recommended by Oracle that the system tablespace not be used for user code. Decide which user tablespace to use to store the cartridge definitions and create or enlarge that tablespace. The cartridge code only requires about 2MB of space, so size isn't a huge issue. For a developmental or test system you can skip these two and just use the system tablespace for the cartridge definitions.

 
  SQL> create tablespace user_ts  <...> # see the SQL reference manual and
                                      # DBA guide for full syntax.
  SQL> alter user c$dcischem default tablespace user_ts;

Now, edit the file 'create.sql'. Modify the line:

  create or replace library c$dcischemlib
    as '/oracle/o815/dcischem/ddlib.so';

and substitute the absolute path to your cartridge installation directory.

If you already have a license key, you can put the insert statement into the create.sql file. If not, you'll need to contact Daylight in order to get a license key. We'll need the hostid from your machine. You can either get it from the command line or from Oracle once the cartridge is installed.

  $ /usr/bin/hostid
  80ad7574

  $ sqlplus c$dcischem/secret
  SQL> select ddpackage.fhostid() from dual;
  
  DDPACKAGE.FHOSTID()
  ---------------------------------------------------------------
  80ad7574

E-mail the hostid value to Daylight, and we'll e-mail back the required license key. NOTE: The license key is not required in order to run the installation scripts. The license key can be added after installation is complete. The cartridge will not successfully run without the license key.

Execute the script 'create.sql'.

  $ svrmgrl < create.sql

The output from svrmgrl will consist of quite a few notes and messages. One should not see any error messages from svrmgrl. If one gets a message about the COMPATIBLE parameter being set incorrectly, see the troubleshooting section below before proceeding further. A sample of the output follows:

  $ svrmgrl < create.sql 

  Oracle Server Manager Release 3.1.5.0.0 - Production

  (c) Copyright 1997, Oracle Corporation.  All Rights Reserved.

  Oracle8i Enterprise Edition Release 8.1.5.0.0 - Production
  With the Partitioning and Java options
  PL/SQL Release 8.1.5.0.0 - Production

  SVRMGR> Connected.
  SVRMGR> Echo                            OFF
  SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR>
  SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR>
  SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR>
  SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR>
  SVRMGR> Connected.
  SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> Statement processed.
  <...>
  Statement processed.
  No errors for PACKAGE DDPACKAGE
  Statement processed.
  No errors for PACKAGE BODY DDPACKAGE
  Statement processed.
  <...>
  No errors for TYPE DDEXACT_IM
  Statement processed.
  No errors for TYPE BODY DDEXACT_IM
  <...>
  No errors for TYPE DDGRAPH_IM
  Statement processed.
  No errors for TYPE BODY DDGRAPH_IM
  <...>
  No errors for TYPE DDROLE_IM
  Statement processed.
  No errors for TYPE BODY DDROLE_IM
  <...>
  No errors for TYPE DDBLOB_IM
  Statement processed.
  No errors for TYPE BODY DDBLOB_IM
  <...>
  SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR> SVRMGR>
  Server Manager complete.
  $

Next, revoke privileges from c$dcischem. The user c$dcischem owns all the cartridge code, however c$dcischem requires no privileges after installation for the cartridge to work. All cartridge functionality operates with the privileges of the current user, not the cartridge owner (see the SQL reference manual for 'CREATE FUNCTION ... AUTHID CURRENT_USER and invokers rights for more details). Also, change the password for c$dcischem to something non-obvious.

  SQL> revoke dba from c$dcischem;

  SQL> revoke create public synonym to c$dcischem;
  SQL> revoke create type to c$dcischem;
  SQL> revoke create role to c$dcischem;
  SQL> revoke create operator from c$dcischem;
  SQL> revoke create indextype from c$dcischem;
  SQL> revoke create table from c$dcischem;
  SQL> revoke create library from c$dcischem;
  SQL> revoke create procedure from c$dcischem;

Configure and start the network listener for EXTPROC. The cartridge uses the 'extproc' functionality on the server within Oracle. The Oracle server contacts the network listener when it wants to run external procedures like the Daylight cartridge. Hence, the listener must always be running in order to user the cartridge.

Note: The Oracle Installer will start and configure the listener if you do a full default Oracle installation, however the listener will not be configured to restart automatically after a reboot. It is best to verify the configuration and state of the listener at this point, however don't be suprised if the listener configuration is already correct.

There are two ways to modify the configuration for the listener. One can either edit the configuration files directly or use the program 'netasst', which provides a GUI front end to the configuration.

In the netasst window, one must configure the listener with a new address. Under 'listeners', select the current listener and select 'ADD ADDRESS'. The Protocol should be 'IPC', the Key should be 'EXTPROC0'. The Protocol Stack Support should be 'Net8 Clients'.

Also in the netasst window, one must add a new 'Net Service Name'. Select the '+' button to add a service name. You'll be prompted for items in the following order:

Net Service Name: EXTPROC_CONNECTION_DATA
Network Protocol: IPC (Local Database)
IPC Key Value: EXTPROC0
Service name: PLSExtProc

Then, save the configuration and exit. If the listener isn't running on the machine, it can be started with:

  $ $ORACLE_HOME/bin/lsnrctl
  
  LSNRCTL for Solaris: Version 8.1.5.0.0 - Production on 14-MAR-00 12:00:03

  (c) Copyright 1998 Oracle Corporation.  All rights reserved.

  Welcome to LSNRCTL, type "help" for information.

  LSNRCTL> start
  
  ...

  LSNRCTL> quit
  $

Alternately, one can edit the listener configuration files directly and add the required entries. A typical, correct network configuration will include the following entries in the files:

$ORACLE_HOME/network/admin/listener.ora

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
      (PROTOCOL_STACK =
        (PRESENTATION = TTC)
        (SESSION = NS)
      )
    )

    ...

  )

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /oracle/o815)
      (PROGRAM = extproc)
    )

   ...

  )

and $ORACLE_HOME/network/admin/tnsnames.ora

EXTPROC_CONNECTION_DATA =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = PLSExtProc)
    )
  )
...

Verify that these sections are present in the two configuration files, then start (or restart) the listener.

Grant the daycart role to a non-privileged user to use the cartridge. For example, create the user 'mug/coffee' and grant daycart privilege. Execute a simple query to test the cartridge.

  SQL> connect sys/<pw>
  SQL> create user mug identified by coffee;
  SQL> grant daycart to mug;
  SQL> connect mug/coffee

  SQL> select ddpackage.fhostid() from dual;
  
  DDPACKAGE.FHOSTID()
  ---------------------------------------------------------------
  80ad7574

  SQL> select ddpackage.ftestlicense('daycart') from dual;

  DDPACKAGE.FTESTLICENSE('DAYCART')
  ---------------------------------
                                  1

  SQL> select smi2cansmi('NCC', 0) from dual;
  
  SMI2CANSMI('NCC',0)
  ---------------------------------------------------------------
  CCN

If the above three queries don't give the expected results, see the next section for troubleshooting.

Installation Troubleshooting:

There are a number of details which must be correct for the cartridge to work properly. For a new Oracle installation, these are typically already set correctly, however the list of issues is included here for diagnostic purposes:

In order to use the cartridge functionality, the server must be configured to support 8.1 compatability. The line: compatible = "8.1.0" must be present in your init.ora file. If it isn't, stop your server, edit the file, and restart the server. If, during the installation, you get the following message gazillions of times, it is because this parameter isn't set correctly:

  SQL> create or replace operator smi2cansmi binding (varchar2, number)
       *
  ORA-00406: COMPATIBLE parameter needs to be 8.1.0.0.0 or greater

If one gets the following error message:

  SQL> select ddpackage.fhostid() from dual;
                                *
  ERROR at line 1:
  ORA-06520: PL/SQL: Error loading external library
  ORA-06522: ld.so.1: extprocPLSExtProc: fatal: /oracle/o85/dcischem/ddlib.so:
  open failed: No such file or directory

the cause is the specification of the c$dcischemlib library. Recheck the lines in the file create.sql and verify that absolute path to the file 'ddlib.so' matches the specification in the statement:

  create or replace library c$dcischemlib
    as '/oracle/o815/dcischem/ddlib.so';

It will be necessary to completely remove and reinstall the cartridge in order to fix this problem. See the section on 'Cartridge De-installation'.

If a valid license has not been inserted into the c$dcischem.license table, then the ftestlicense function will return:

  SQL> select ddpackage.ftestlicense('daycart') from dual;

  ERROR at line 1:
  ORA-29400: data cartridge error
  C$DCISCHEM-003: FAILED: License not available

In this case, you must contact Daylight and get a valid license key for the cartridge. Provide the hostid to Daylight (see installation above) and we'll provide to you the insert statement which must be run. If you have inserted the license key, then perhaps the c$dcischem.license table is not visible to the current user. Try:

  SQL> select * from c$dcischem.license;

  PRODUCT   KEY                              EXPIRATION
  --------- -------------------------------- ----------
  daycart   8befd3d06296e9b89abfb158b431d278 01-JAN-01

If this table isn't visible, then perhaps the role 'daycart' hasn't been granted to the current user. Remember that each user must have the 'daycart' role granted to it in order to use the cartridge. The 'daycart' role grants select on c$dcischem.license and c$dcischem.progob, execute on the cartridge functions, and create table privilege.

If the network listener is not running or is misconfigured, then you'll get a message like:

  SQL> select smi2cansmi('NCC', 0) from dual
       *
  ERROR at line 1:
  ORA-28575: unable to open RPC connection to external procedure agent

when attempting to access any Daylight Cartridge functions. In this case, verify that the listener is running:

  $ lsnrctl
  LSNRCTL> status

  Connecting to (DESCRIPTION ...
  STATUS of the LISTENER
  ------------------------
  Alias                     LISTENER
  Version                   TNSLSNR for Solaris: Version 8.1.5.0.0 - Production
  Start Date                14-MAR-00 12:00:37
  Uptime                    0 days 0 hr. 17 min. 15 sec
  Trace Level               off
  Security                  OFF
  SNMP                      OFF
  Listener Parameter File   /oracle/o815/network/admin/listener.ora
  Listener Log File         /oracle/o815/network/log/listener.log
  Services Summary...
    PLSExtProc            has 1 service handler(s)
    dev           has 3 service handler(s)
  The command completed successfully

Note that the listener is running, and that there is a service listed for PLSExtProc. Also, verify that the files: $ORACLE_HOME/network/admin/listener.ora and $ORACLE_HOME/network/admin/tnsnames.ora have the appropriate entries. If necessary, restart the listener to make sure that any configuration file changes have taken effect.

In one case, we've seen a problem with the LD_LIBRARY_PATH not being set for the Oracle instance. This was an non-standard Oracle instance which was upgraded from 8.0.X, so it isn't clear where the problem arose. The cartridge requires the Oracle shared object:

libclntsh.so.8.0

This file is found in $ORACLE_HOME/lib/. It is preferred that the UNIX Oracle userid have the LD_LIBRARY_PATH set in the .profile or .cshrc to include $ORACLE_HOME. If this is not the case, define LD_LIBRARY_PATH and then restart the Oracle instance and listener.

Verification

The subdirectory 'dcischem/TEST' includes a number of stand-alone SQL scripts which test the cartridge installation. The shell script 'run_tests' will execute all the SQL scripts in turn and compare the output to reference output files. Any discrepencies will be reported. The script 'run_tests' does attempt to log in as 'c$dcischem/secret', so if you've changed the password and privileges on user 'c$dcischem' you'll need to edit run_tests to reflect this change.

The scripts do use the 'EXPLAIN PLAN' facility, which requires that the plan table be present for the test user. The script $ORACLE_HOME/rdbms/admin/utlxplan.sql creates the plan table; this should be executed before running the 'run_tests' script.

  $ run_tests
  Testing dd_basic_test ...
  Comparing output to reference file ...
  dd_basic_test test OK.

  Testing dd_exact_test ...
  Comparing output to reference file ...
  dd_exact_test test OK.
  < ... >
  All tests passed.

Also, note that the SQL scripts give useful examples of using the cartridge functionality in each category.

De-Installation

In order to cleanly de-install the cartridge, all dependent indexes must be dropped from the system first. That is, any indexes created using one of the Daylight indextypes (ddexact, ddrole, ddgraph, ddblob) should be dropped. If one does not drop all the indexes first, then they will be marked as invalid. If this occurs, the indexes and their associated index tables (<indexname> || '_DDT') must be dropped manually.

After any dependent indexes have been dropped, one can run the script 'clean.sql'. This script will remove all cartridge index definitions, functions, operators, packages, roles, and synonyms. The script 'clean.sql' will remove everything which is installed by the script 'create.sql' with the exception of the license and program object tables. These tables are preserved since it is convenient to not need to recreate and populate these tables. The user c$dcischem will require DBA privilege to be granted in order to run the script. Example output follows:

  $ svrmgrl < clean.sql  

  Oracle Server Manager Release 3.1.6.0.0 - Production

  Copyright (c) 1997, 1999, Oracle Corporation.  All Rights Reserved.

  Oracle8i Enterprise Edition Release 8.1.6.0.0 - Production
  With the Partitioning option
  JServer Release 8.1.6.0.0 - Production

  SVRMGR> Connected.
  SVRMGR> Echo                            OFF
  SVRMGR> SVRMGR> Statement processed.
  SVRMGR> SVRMGR> Connected.
  Echo                            OFF
  Statement processed.
  ...
  Server Manager complete.
  $

It is safe to repeatedly run the scripts create.sql and clean.sql. The create.sql script creates a bunch of objects, types, etc. The clean.sql script removes them. Neither script will modify any other objects, types, or tables owned by c$dcischem or any other user on the system. The only components created by the create.sql script which are not removed by the clean.sql script are the license and progob tables. These two tables are preserved by the clean.sql script for convenience.

When one runs create.sql after clean.sql, one will see warnings that the tables license and progob already exist, however these warnings don't impact the outcome of the cartridge re-installation.

After running clean.sql, the database will be left in its original pre-cartridge state. One can then delete the license and progob tables, if desired, to eliminate all remnants of the cartridge.

Note that the user c$dcischem will continue to exist after running clean.sql and any non-cartridge tables, objects, etc. owned by c$dcischem will be unaffected by either clean.sql or create.sql.

The other way to remove the cartridge is to drop the user c$dcischem. Note that this does not remove public synonyms or roles, so these should be dropped manually (or via clean.sql).

  SQL> drop user c$dcischem cascade;

This will remove the user c$dcischem and all dependent objects. It will be necessary to repeat the entire installation procedure if this method is chosen.

PL/SQL Functions, SQL Operators:

There are quite a few user-accessible PL/SQL functions and SQL operators implemented for the cartridge. All are stateless. All of the PL/SQL functions are contained in a PL/SQL package called 'ddpackage'. This package is owned by 'c$dcischem'. Privileges required to access these functions are granted by the 'daycart' role.

The default installation (create.sql) also creates public synonyms for the package 'ddpackage' and all of the operators described herein. Hence, from any Oracle user, one can access the functions and operators by the signature documented below (eg: select tanimoto(...) from table rather than select c$dcischem.tanimoto(...) from table).

The one place where synonyms aren't allowed is in indextype specifications; hence you must refer to indextypes by their full names when creating indexes as a user other than c$dcischem (eg. create index <name> on small(smi) indextype is c$dcischem.ddblob).

function ddpackage.fsetdebug ( value IN NUMBER ) => NUMBER: Controls the level of detail of messages logged to the file: "/tmp/extproc.log". Sets the new value of the logging level to 'value'. Returns the prior value. Valid values are integers the range of 0 - 9.

0 - No logging at all. 1 - Error messages only (the default value). 5 - Warnings and errors. 9 - Notes, warnings, and errors.

This value is set on a per-session basis. Other values are reserved for future use.
function ddpackage.ftestlicense ( product IN VARCHAR2 ) => NUMBER: Checks the cartridge license. Takes an input parameter, which is the license type to check. [ In the current release, the only recognized value is 'daycart' ]. Returns 1 if the Daylight cartridge has a valid license, and 0 otherwise.
The license is contained in a special table which must be created and populated at cartridge install time.
function ddpackage.fhostid ( ) => VARCHAR2: Returns the hardware hostid which is used for generation of the hardware- specific license key. You'll need to provide this value to Daylight for us to generate a license.
function ddpackage.fversion ( which in VARCHAR2 ) => VARCHAR2: Returns the string version number of the Daylight toolkit or cartridge shared-object library currently running. Valid input parameters are:
'toolkit' - Returns the current Daylight Toolkit version. 'daycart' - Returns the cartridge executable version.
Molecule / Reaction Functions:
function ddpackage.fsmi2cansmi ( smiles IN VARCHAR2, type IN NUMBER ) => VARCHAR2 operator smi2cansmi ( smiles IN VARCHAR2, type IN NUMBER ) => VARCHAR2: Canonicalize an input SMILES. Returns a canonical SMILES string. Type is either 0 or 1, for unique or absolute SMILES, respectively.
function ddpackage.fsmi2netch ( smiles IN VARCHAR2 ) => NUMBER operator smi2netch ( smiles IN VARCHAR2 ) => NUMBER: Return the net charge of the molecule or reaction.
function ddpackage.fsmi2hcount ( smiles IN VARCHAR2 ) => NUMBER operator smi2hcount ( smiles IN VARCHAR2 ) => NUMBER: Return the total hydrogen count for the molecule or reaction.
function ddpackage.fsmi2mf ( smiles IN VARCHAR2 ) => VARCHAR2 operator smi2mf ( smiles IN VARCHAR2 ) => VARCHAR2: Returns the molecular formula string for the molecule or reaction.
function ddpackage.fsmi2amw ( smiles IN VARCHAR2 ) => NUMBER operator smi2amw ( smiles IN VARCHAR2 ) => NUMBER: Return the average molecular weight for a molecule or reaction. The weight used for any atoms which do not have specified isotopes is the average atomic weight. The weight used for atoms with a specified isotope is the high precision molecular weight for that atom. For example, "c1ccccc1" returns 78.1184, the average molecular weight, while "[1H][12c]1[12c]([1H])[12c]([1H])[12c]([1H])[12c]([1H])[12c]1[1H]" returns 78.0469502, the high precision molecular weight for that particular isotope.
function ddpackage.fsmi2graph ( smiles IN VARCHAR2 ) => VARCHAR2 operator smi2graph ( smiles IN VARCHAR2 ) => VARCHAR2: Return the hydrogen- and charge-suppressed canonical graph string for a given molecule or reaction.
Fingerprint Functions:
function ddpackage.fsmi2fp ( smiles IN VARCHAR2, min IN NUMBER, max IN NUMBER, nbits IN NUMBER ) => VARCHAR2 operator smi2fp ( smiles IN VARCHAR2, min IN NUMBER, max IN NUMBER, nbits IN NUMBER ) => VARCHAR2: Returns the ASCII fingerprint for a given molecule or reaction. Min and max are the minimum and maximum pathlengths, respectively, and size is the number of bits in the fingerprint.
function ddpackage.fsmi2xfp ( smiles IN VARCHAR2, min IN NUMBER, max IN NUMBER, nbits IN NUMBER ) => VARCHAR2 operator smi2xfp ( smiles IN VARCHAR2, min IN NUMBER, max IN NUMBER, nbits IN NUMBER ) => VARCHAR2: Return the ASCII difference fingerprint for a given molecule or reaction. Min and max are the minimum and maximum pathlengths, respectively, and size is the nubmer of bits in the fingerprint.
function ddpackage.ffoldfp ( fpstr IN VARCHAR2, nbits IN NUMBER, dens IN NUMBER ) => VARCHAR2 operator foldfp ( fpstr IN VARCHAR2, nbits IN NUMBER, dens IN NUMBER ) => VARCHAR2: Fold the given fingerprint to the minimum appropriate size or density, whichever is limiting, and returns the new, folded fingerprint.
function ddpackage.fbitcount ( fpstr IN VARCHAR2 ) => NUMBER operator bitcount ( fpstr IN VARCHAR2 ) => NUMBER: Return the number of bits on in the fingerprint.
function ddpackage.fnbits ( fpstr IN VARCHAR2 ) => NUMBER operator nbits ( fpstr IN VARCHAR2 ) => NUMBER: Return the total size of the fingerprint, in bits. In the current Daylight toolkit this will always be a power of two.
function ddpackage.fisfp ( fpstr IN VARCHAR2 ) => NUMBER operator isfp ( fpstr IN VARCHAR2 ) => NUMBER: Return 1 if the string is a fingerprint, 0 otherwise. The syntax of a fingerprint can never be confused with a valid SMILES. This is the bit of cleverness which allows us to overload the searching functions.
function ddpackage.fatomnorm ( smiles IN VARCHAR2, list IN VARCHAR2, ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2 operator atomnorm ( smiles IN VARCHAR2, list IN VARCHAR2, ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2 function ddpackage.fbondnorm ( smiles IN VARCHAR2, list IN VARCHAR2, ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2 operator bondnorm ( smiles IN VARCHAR2, list IN VARCHAR2, ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2: Returns a potentially reordered N-tuple string for the given list VARCHAR2 input parameter. The list string is reordered based on the canonical atom or bond ordering of the input SMILES. ntuple is the number of comma-separated values per atom or bond, and isotype is 0 for unique SMILES canonicalization and 1 for absolute SMILES canonicalization.
Comparison Functions:
function ddpackage.fexact ( a IN VARCHAR2, b IN VARCHAR2) => NUMBER operator exact ( a IN VARCHAR2, b IN VARCHAR2) => NUMBER: Returns 1 if the two input strings are identical, 0 otherwise. The operator is optionally backed by the ddexact indextype.
function ddpackage.fgraph ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER operator graph ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER: Returns 1 if the two input SMILES share the same canonical graph, 0 otherwise. The operator is optionally backed by the ddgraph indextype.
function ddpackage.ftautomer ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER operator tautomer ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER: Returns 1 if the two input SMILES share the same canonical graph, net charge, and total hydrogen count, 0 otherwise. The operator is optionally backed by the ddgraph indextype.
function ddpackage.freactant ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER operator reactant ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER function ddpackage.fagent ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER operator agent ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER function ddpackage.fproduct ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER operator product ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER: Returns 1 if smiles2 (a molecule SMILES) is a component of smiles1 (a reaction SMILES) with the appropriate role, otherwise returns 0. The operators are optionally backed by the ddrole indextype.
function ddpackage.fcontains ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2) => NUMBER operator contains ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2) => NUMBER: Returns 1 if smiles1 contains smiles2; that is, smiles2, assuming opened valences for all hydrogens, is a substructure of smiles1. The operator is optionally backed by the ddblob indextype.
function ddpackage.fmatches ( smiles IN VARCHAR2, smarts IN VARCHAR2) => NUMBER operator matches ( smiles IN VARCHAR2, smarts IN VARCHAR2) => NUMBER: Returns 1 if the smarts expression matches the given SMILES, 0 otherwise. The operator is optionally backed by the ddblob indextype.
function ddpackage.feuclid ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2 ) => NUMBER operator euclid ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2 ) => NUMBER: Return the euclidean distance between two fingerprints or SMILES. If the fingerprints are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If either parameter is a SMILES, the fingerprint is generated automatically. The returned value is a floating point number between 0.0 and 1.0. This is optionally backed by the ddblob indextype.
function ddpackage.ftanimoto ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2 ) => NUMBER operator tanimoto ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2 ) => NUMBER: Return the tanimoto distance between two fingerprints or SMILES. If the fingerprints are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If either parameter is a SMILES, the fingerprint is generated automatically. The returned value is a floating point nubmer between 0.0 and 1.0. This is optionally backed by the ddblob indextype.
function ddpackage.ftversky ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2, alpha IN NUMBER, beta IN NUMBER ) => NUMBER operator tversky ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2, alpha IN NUMBER, beta IN NUMBER ) => NUMBER: Return the tversky distance between two fingerprints or SMILES. If the fingerprints are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If either parameter is a SMILES, the fingerprint is generated automatically. The parameters alpha and beta must be in the range of 0.0 - 1.0. The returned value is a floating point nubmer between 0.0 and 1.0. This is optionally backed by the ddblob indextype.
function ddpackage.ffingertest ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2 ) => NUMBER operator fingertest ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2 ) => NUMBER: Returns 1 if all of the bits in fp_or_smi2 are also present in fp_or_smi1. That is, the fingerprint from fp_or_smi2 represents a possible substructure of fp_or_smi1. If the fingerprints are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If either parameter is a SMILES, the fingerprint is generated automatically. The returned value is either 0 or 1. NOTE: the order of fingerprints for the comparison in this function is reversed from the order in the Daylight Toolkit. This is optionally backed by the ddblob indextype.
Program Object Functions:
There are two lower-level functions which invoke program objects. It is expected that they will almost always be called from a PL/SQL wrapper layer, which would be responsible for packaging the communication between Oracle and the program object in a meaningful way. Hence, the default cartridge installation does not create operators for these two functions.
function ddpackage.fprogob ( name IN VARCHAR2, message IN VARCHAR2) => VARCHAR2: Communicates with a program object running on the Oracle server. The parameter 'name' is the symbolic name of the program object. The table c$dcischem.progob contains the mappings of symbolic names to actual executable programs. The function fprogob() proceeds as follows:

Looks up the symbolic name in the table c$dcischem.progob,
If the executable is not already running, starts the program object,
Parses the 'message' parameter into a sequence of strings,
Sends the sequence of strings to the program object,
Takes the returned sequence of strings and converts it into a newline delimited VARCHAR2,
Returns the VARCHAR2 string.

Valid delimiters for the lines in the 'message' parameter include the normal line-termination characters for UNIX, Mac, and PC: '\n', '\r', and '\r\n'. The function properly handles all three termination cases. The returned VARCHAR2 string is delimited by UNIX line-termination '\n'.
function ddpackage.fprogob ( name IN VARCHAR2, message IN OUT CLOB, answer IN OUT CLOB) => NUMBER: Identical to the previous function in operation; the message to the program object and the response from the program object are passed as CLOB, rather than VARCHAR2. The parameter 'message' is a CLOB containing the message to be sent to the program object. The parameter 'answer' must be an empty CLOB, into which the response from the program object is copied.
Valid delimiters for the lines in the 'message' parameter include the normal line-termination characters for UNIX, Mac, and PC: '\n', '\r', and '\r\n'. The function properly handles all three termination cases. The returned VARCHAR2 string is delimited by UNIX line-termination '\n'.
The function returns success or failure.

Extensible Indexes:

The default installation (create.sql) creates public synonyms for the package 'ddpackage' and all of the operators. The one place where synonyms aren't allowed is in indextype specifications; indextypes must be refered to by their full names when creating indexes as a user other than c$dcischem (eg. create index <namegt; on small(smi) indextype is c$dcischem.ddblob).

All four of the extensible indexes create Oracle tables to maintain their required internal data. The ddexact, ddgraph, and ddrole indextypes use regular Oracle tables to store the data; the ddblob indextype uses an Oracle table with a single row containing BLOB data. These internal tables are maintained automatically by the index code; there is normally no need for a user or application developer to know about these auxillary tables.

There are two situations where the developer or DBA must pay attention to these auxillary tables. First, on index creation, the auxillary data tables are created in the users default tablespace. Second, if an index gets corrupted, the developer or DBA may need to delete the auxillary data table manually.

Exact lookup indextype (ddexact):

This indextype supports the exact() operator. This is the least interesting index, as it implements a simple string comparison between the indexed column and the query. In that sense, the built in BTree index within Oracle has the similar functionality. The BTree index even supports the "greater than" and "less than" operators, which this index doesn't.

The main advantage of the ddexact indextype is that it can index strings up to 4000 bytes (the maximum varchar2 length), while the internal BTree can only index strings up to approximately 40% of a block; for a database with a 2048 byte blocksize, this means that the largest indexable varchar2 string using the BTree is approximately 700 bytes. Similarly, in a future version of the cartridge this index will support the comparison of LOB datatypes; the BTree index does not.

This indextype uses an auxillary data table within Oracle to maintain its index data. This table is kept in sync with the base table in real time and strictly obeys the Oracle transaction model for all DML operations. The name of the auxillary data table is: <index name> || '_DDT'.

The index can be created on a VARCHAR2 column with the following statement:

  SQL> create index <index name> on <table> ( <column> )
         indextype is c$dcischem.ddexact;

The valid form for the operator which uses the index is:

  SQL> select * from table where exact(smiles, 'Oc1ccccc1') = 1;

This index makes no assumptions about the content of the indexed column. The applications are responsible for normalizing both the data in the indexed column and and queries to the exact() operator. This allows the ddexact indextype to properly handle both Unique and Absolute SMILES without complication. In the following example, the USMILES column is normalized using smi2cansmi(smiles, 0) and the ASMILES column is normalized using smi2cansmi(smiles, 1). Note also that when performing the exact() query on a column, the query is normalized using the same method as the column data.

  SQL> create table test (id number, usmiles varchar2(4000),
         asmiles varchar2(4000));

  SQL> desc test;
  Name                               Null?    Type
  --------------------------------- -------- ----------------------------
  ID                                         NUMBER
  USMILES                                    VARCHAR2(4000)
  ASMILES                                    VARCHAR2(4000)

  SQL> insert into test values (1234, smi2cansmi('Cl/C=C/Cl', 0),
                                      smi2cansmi('Cl/C=C/Cl', 1));

  SQL> select * from test;
  
        ID  USMILES   ASMILES
  --------  -------   -------
    1234    ClC=CCl   Cl/C=C/Cl

  SQL> create index usmi_index on test(usmiles) indextype 
         is c$dcischem.ddexact;
  SQL> create index asmi_index on test(asmiles) indextype 
         is c$dcischem.ddexact;

  SQL> select * from test 
         where exact(usmiles, smi2cansmi('Cl/C=C/Cl', 0)) = 1;

        ID  USMILES   ASMILES
  --------  -------   -------
    1234    ClC=CCl   Cl/C=C/Cl

  SQL> select * from test 
         where exact(asmiles, smi2cansmi('Cl/C=C/Cl', 1)) = 1;

        ID  USMILES   ASMILES
  --------  -------   -------
    1234    ClC=CCl   Cl/C=C/Cl

NOTE: Because this indextype accesses the base table during a index scan, operations which modify the base table based on the results of the index scan aren't allowed. This is similar to the case where a trigger on a column attempts to modify the column data. The message which one receives is the following:

  SQL> delete from <table> where exact(smiles, 'CCN') = 1;

  ERROR at line 1:
  ORA-29903: error in executing ODCIIndexFetch() routine
  ORA-29400: data cartridge error
  ERROR ORA-04091: table C$DCISCHEM.T1 is mutating, trigger/function may
  not see it
  C$DCISCHEM-101: CARTRIDGE ERROR
  ORA-06512: at "C$DCISCHEM.DDEXACT_IM", line 159
  ORA-06512: at line 1

In order to perform this function, one must use two steps. One option would be:

  SQL> create table <temptable> (zapcol) as
          select rowid from <table> where exact(smiles, 'CCN') = 1;
  SQL> delete from <table> where rowid in 
          (select zapcol from <temptable>)

Graph and Tautomer indextype (ddgraph):

This indextype supports the graph() and tautomer() operators. It creates a tabular index of the graph and tautomeric information (hydrogen count and net charge) for rapid retrieval. The indexed search only requires an index lookup based on the graph of the query, followed by comparison of the net charge and hydrogen count for a tautomer search.

This indextype uses a special data table within Oracle to maintain its index data. This table is kept in sync with the base table in real time and strictly obeys the Oracle transaction model for DML operations. The name of the auxillary data table is: <index name> || '_DDT'.

The index can be created on a VARCHAR2 column with the following statement:

  SQL> create index <index name> on <table> ( <column> )
         indextype is c$dcischem.ddgraph;

The valid forms for the operators to use the index are:

  SQL> select * from table where graph(smiles, 'Oc1ccccc1') = 1;
  SQL> select * from table where tautomer(smiles, 'Oc1ccccc1') = 1;

Unlike the ddexact index, the ddgraph indextype always interprets the column as SMILES. Invalid SMILES are rejected. There is no concern with the ddgraph indextype as to whether the SMILES are Unique or Absolute, as stereochemical and isomeric information are discarded for the graph() and tautomer() lookups.

NOTE: As with the ddexact indextype, this indextype accesses the base table during a index scan. See the previous section on the limitations to performing modifications of the base table based on the results of an index scan.

Reaction role index (ddrole):

This indextype supports component level searches for exact molecules within reactions by role. It creates a tabular index of each component in every reaction, by role. Component level searches are implemented simply by a index lookup of the component, followed by validation of the role.

This indextype uses a special data table within Oracle to maintain its index data. This table is kept in sync with the base table in real time and strictly obeys the Oracle transaction model for all DML operations. The name of the auxillary data table is: <index name> || '_DDT'.

The index can be created on a VARCHAR2 column with the following statement:

  SQL> create index <index name> on <table> ( <column> )
         indextype is c$dcischem.ddrole;

The valid forms for the operators to use the index are:

  SQL> select * from rxn where reactant(smiles, 'Oc1ccccc1') = 1;
  SQL> select * from rxn where agent(smiles, 'Oc1ccccc1') = 1;
  SQL> select * from rxn where product(smiles, 'Oc1ccccc1') = 1;

If a row value contains a molecule SMILES, that row is ignored for the ddrole index. Like the ddexact index, the ddrole index makes no assumptions about the canonicalization of the SMILES in the index column. Hence, the column must be normalized by the application to the appropriate SMILES and the query on the column must match that normalization (see the explanation re. ddexact above).

SMARTS and Similarity search index (ddblob):

This index caches the SMILES and fingerprints in a single binary large object for rapid searching. Substructure and similarity searches are implemented through this blob-based image of the structural data. The blob is persistant; it is automatically stored in the Oracle database and retrieved for processing. The blob is kept synchronized with the base table in real time and strictly obeys the Oracle transaction model for all DML operations. The name of the auxillary data table which contains the BLOBs is: <index name> || '_DDT'.

The index can be created on either a SMILES or fingerprint column, provided that the fingerprints are all the same size. If the index is created on a SMILES column, all six searches are implemented on that column: contains(), matches(), tanimoto(), tversky(), euclid(), and fingertest(). If the index is created on a fingerprint column, the four fingerprint-specific searches (tanimoto(), tversky(), euclid(), and fingertest()) are supported for that column.

The index creation command takes an optional parameter 'fpsize'. This value is the number of bits that the blob index will use for fingerprint creation.

When indexing a SMILES column, the blob will use the 'fpsize' parameter for the creation size for fingerprints within the internal blob index or a default value of 512 bits if the 'fpsize' is not specified.

When indexing a fingerprint column, all fingerprints within that column must match the 'fpsize' parameter. If the 'fpsize' parameter is not specified, the size of the first fingerprint encountered by the index code is used. Any fingerprints which do not match this size are rejected by the index.

The query for tanimoto(), tversky(), euclid(), and fingertest() can be either a fingerprint or a SMILES. If a SMILES, it will be used to create a normal fingerprint of the appropriate size for the index which will be used for comparison. If the query is a fingerprint, the size of the fingerprint must match the size of the fingerprint in the column (or in the blob for a SMILES index). The indexes and operators never perform automatic folding of fingerprints during searches.

The index can be created on a VARCHAR2 column with the following forms:

  -- Default internal fingerprint size of 512 bits

  SQL> create index <index name> on 
        <table> ( <SMILES column> ) indextype is c$dcischem.ddblob;

  -- Specify internal fingerprint size of 64 bits

  SQL> create index <index name> on 
        <table> ( <SMILES column> ) indextype is c$dcischem.ddblob
         parameters ('fpsize=64');

  -- fpsize taken from size of data in FP column

  SQL> create index <index name> on 
        <table> ( <FP column> ) indextype is c$dcischem.ddblob;

The valid forms for the operators to use the index are:

  SQL> select count(1) from table where contains(smiles, 'O=c1ccocc1') = 1;
  SQL> select count(1) from table where matches(smiles, 'O=c1ccocc1') = 1;

  SQL> select count(1) from medium where tanimoto(smiles, 'OC(=O)CS') > 0.8;
  SQL> select count(1) from medium 
    where tversky(smiles, 'OC(=O)CS', 0.5, 0.5) > 0.8;
  SQL> select count(1) from medium where euclid(smiles, 'OC(=O)CS') < 0.2;
  SQL> select count(1) from medium where fingertest(smiles, 'OC(=O)CS') = 1;

  -- Assume that the indexes have been created using 64-bit 
  -- fingerprints (for example).  A fingerprint can be used directly
  -- as a query for tanimoto(), euclid(), fingertest() and tversky()
  -- instead of a SMILES.

  SQL> select smi2fp('OC(=O)CS', 0, 7, 64) from dual;

  SMI2FP('OC(=O)CS',0,7,64)
  --------------------------
  IM9vA4w127g.2

  SQL> select count(1) from medium where tanimoto(smiles, 'IM9vA4w127g.2') > 0.8;
  SQL> select count(1) from medium where tanimoto(fp, 'OC(=O)CS') > 0.8;
  SQL> select count(1) from medium where tanimoto(fp, 'IM9vA4w127g.2') > 0.8;

To further illustrate the blob index, consider the following sequence:

  SQL> create table test (smiles varchar2(4000), fp varchar2(4000));

  SQL> insert into test values (smi2cansmi('CCN', 0), NULL);

  ... load bunches of SMILES...

  SQL> commit;

  SQL> update test
         set fp = smi2fp(smiles, 0, 7, 128);

  SQL> commit;

  -- We specified a preferred size for the internally generated fingerprints
  -- as 1024 bits.  These are internal to the blob and aren't visible.

  SQL> create index s1 on test(smiles) indextype is 
         c$dcischem.ddblob parameters ('fpsize=1024');

  -- This index, on the FP column, uses the size of the fingerprints in 
  -- the column (128 bits, in this case) in index creation.

  SQL> create index f1 on test(fp) indextype is c$dcischem.ddblob;

  SQL> commit;

  -- This screens and searches using the index s1; internally using 1024 bit
  -- fingerprints.

  SQL> select count(1) from test where contains(smiles, 'NCCc1ccccc1') = 1;

  -- This screens and searches using the index s1; internally using 1024 bit
  -- fingerprints.  The output is generated using the functional form, at 
  -- its default of 512 bits.

  SQL> select smiles, tanimoto(smiles, 'NCCc1ccccc1') from test
         where tanimoto(smiles, 'NCCc1ccccc1') > 0.8;

  -- This screens and searches using the index s1; internally using 1024 bit
  -- fingerprints.  The output is generated using the functional form; this
  -- time calculated at 1024 bits.

  SQL> select smiles, tanimoto(smiles, smi2fp('NCCc1ccccc1', 0, 7, 1024))
         from test where tanimoto(smiles, 'NCCc1ccccc1') > 0.8;

  -- This searches using the index f1, using the 128 bit fingerprints from
  -- the actual column for comparison.

  SQL> select smiles, tanimoto(fp, 'NCCc1ccccc1') from test
         where tanimoto(fp, 'NCCc1ccccc1') > 0.8;

  -- This does two searches, using both the s1 and f1 indexes, and
  -- merges the results.

  SQL> select * from test where contains(smiles, 'Oc1ccccc1') = 1 and
         tanimoto(fp, 'Oc1ccccc1F') > 0.5;

  -- Drops the table and both indexes.

  SQL> drop table test;

Using Program Objects:

This section describes the configuration and use of program objects from within Oracle.

Program objects are stand-alone executable programs which communicate via standard input and standard output using the "Pipetalk" protocol (See the Daylight Theory Manual for more details). The cartridge supports the ability to execute external program objects on the same machine as the Oracle server, and to communicate with the program objects in a robust, general way.

The table c$dcischem.progob is the dictionary of program objects available to the Oracle server. In order to use a program object, it must be defined in this dictionary table.

  SQL>  desc progob
   Name                                      Null?    Type
   --------------------------------------- -------- -------------------------
   NAME                                               VARCHAR2(60)
   PATH                                               VARCHAR2(4000)
   ARGS                                               VARCHAR2(4000)

For example, consider the clogp program object. The code for the clogp example is included in the CONTRIB subdirectory. It can be defined to the system with:

  SQL> insert into progob values
     ('clogp', '/oracle/o815/progob/clogptalk.sh', NULL);

This defines a program object with the symbolic name 'clogp'. It corresponds to the absolute path: '/oracle/o815/progob/clogptalk.sh', which must be an executable program object. In this case, the arguments to the program object (whitespace delimited) are NULL.

The file '/oracle/o815/progob/clogptalk.sh' looks like the following:

  #!/bin/sh

  DY_ROOT=/usr3/thor/day462/v462
  export DY_ROOT

  DY_LICENSEDATA=/usr/local/daylight/dy_license.dat
  export DY_LICENSEDATA

  LD_LIBRARY_PATH=/opt/SUNWspro/SC2.0:/usr/openwin/lib:/usr/lib:$DY_ROOT/lib
  export LD_LIBRARY_PATH

  $DY_ROOT/bin/clogptalk

When Oracle executes the program object, the environment is empty. Hence, all required environment variables must be defined in a shell wrapper before calling the actual program object.

Since program objects are initially executed under the Oracle userid, a strict security policy must be obeyed to prevent access to unauthorized Oracle privileges. The policy, checked and enforced by the cartridge code before executing any external program object, is the following:

If the program object is executable and SUID, then it is OK. (SUID requires privileges of the destination UID, so this means that whoever turned on the SUID bit has either root or owner privilege on the file.
If the program object is executable by and owned by Oracle, and the directory is not group or world writeable, then OK.
Otherwise, fail.

Running program objects are persistant on the server, and are executed on a per-user-session basis. The first time a program object is accessed within a user session it is started, and the program object will continue to run until the user session is terminated. If the program object crashes during the user session, the cartridge will restart it and retry the transaction. Because of the ability to restart program objects, program objects ideally should be stateless. All data required for a transaction should be passed to the program object within that transaction.

Program object communication through the ddpackage.fprogob() function is via VARCHAR2 or CLOB datatypes. The string types can contain one or more delimited lines of data. Because of this line-oriented data communication, most program objects will require a PL/SQL wrapper function to convert from Oracle datatypes to line-oriented data and to parse returned results. Continuing the clogp example, consider a simple function which will calculate clogp:

  create function fclogp (sosdata in varchar2) return number
   as
    v1 varchar2(4000);
    rc number;
    off1 number;
    off2 number;
   begin
    v1 := c$dcischem.ddpackage.fprogob('clogp', sosdata);
    off1 := instr(v1, ' ', 1, 1);
    off2 := instr(v1, ' ', 1, 2);
    rc := to_number(substr(v1, off1, off2 - off1));
    return rc;
   end;

Executing the fprogob() function directly using the clogptalk program object results in a full line of data being returned from the program object for each SMILES:

  SQL> select ddpackage.fprogob('clogp', 'c1ccccc1') from dual;

  DDPACKAGE.FPROGOB('CLOGP','C1CCCCC1')
  -----------------------------------------------------------------------------
  c1ccccc1 2.142 0 LogPstar: 2.13

The wrapper function fclogp() simply parses out the second field, converts it to a number, and returns it. The wrapper function hides the details from the user and allows the function to return the desired numeric value.

  SQL> select fclogp('c1ccccc1') from dual;

  FCLOGP('C1CCCCC1')
  ------------------
               2.142

More complex examples (eg. MOLFILE -> SMILES conversion) are included in the CONTRIB directory.

Optimizer Support.

The cartridge will provide estimates of the costs of extensible index queries to the Oracle optimizer. The Oracle optimizer uses these cost estimates for its decision making with respect to the execution plan for a given SQL query. The execution plan chosen by Oracle can have a dramatic effect on the time and resources required to perform a particular query. The cartridge has the ability to provide selectivity and resource estimates of varying precision to Oracle, depending upon the configuration of the database index.

The default cartridge installation includes the code for performing cost estimates, but does not enable the cartridge optimizer code. All costs for index queries default to zero. Any Oracle query will preferrentially execute the extensible index query first.

In order to activate the optimizer code for the Daylight cartridge, one must execute the following commands:

  SQL> connect c$dcischem/secret
  SQL> associate statistics with packages ddpackage using ddoptimizer;
  SQL> associate statistics with indextypes ddexact using ddoptimizer;
  SQL> associate statistics with indextypes ddgraph using ddoptimizer;
  SQL> associate statistics with indextypes ddrole using ddoptimizer;
  SQL> associate statistics with indextypes ddblob using ddoptimizer;

Once the optimizer code is enabled, one can view the cost estimates provided to Oracle and the execution plan chosen using the "explain plan" command in SQL.

The command "analyze index <indexname> compute statistics" will collect and store information about the size of the index table for the ddgraph, ddrole, and ddexact indextypes. This size information will then be used to refine the cost estimates provided to the Oracle optimizer. The ddblob indextype does not require a separate "analyze index" command; the blob index always knows its exact size.

Note that the cost and selectivity estimates are not going to be reasonable in all cases; developers may find that forcing a particular execution order will result in better query performance. Oracle SQL supports a number of hints which allow the developer to control the execution order. See the section on "Optimizer Hints" in the Oracle SQL Reference manual for more details.

It can also be useful to specify the functional form of a query in order to eliminate consideration of the index path for the execution plan. Consider the following query:

  SQL> select * from <table> where
          graph(smiles, 'c1ccccc1') = 1 and
          contains(smiles, 'C') = 1;

By default, (assuming that both required indexes exist) this query will use indexes for both the graph and contains search, and then will merge the results. In most cases, it would be much more efficient to perform the graph index search, then perform the functional version of contains() on the handful of hits returned from the graph search. The following query will force the contains() operation to occur functionally:

  SQL> select * from <table> where
          graph(smiles, 'c1ccccc1') = 1 and
          ddpackage.fcontains(smiles, 'C') = 1;

In order to disable the cartridge optimizer code, any computed statistics must be dropped with "analyze index <indexname> delete statistics". Then, The cartridge optimizer code can be disabled with the following commands:

  SQL> connect c$dcischem/secret
  SQL> disassociate statistics with indextypes ddblob using ddoptimizer;
  SQL> disassociate statistics with indextypes ddgraph using ddoptimizer;
  SQL> disassociate statistics with indextypes ddrole using ddoptimizer;
  SQL> disassociate statistics with indextypes ddexact using ddoptimizer;
  SQL> disassociate statistics with packages ddpackage using ddoptimizer;

Once the cartridge optimizer code is disabled, cost estimates revert to their default values of zero cost.

Error Codes:

This section lists the possible error codes returned from the cartridge and diagnositic actions:

General Errors

C$DCISCHEM-001: Unable to Get Session MEMORY

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-002: Debug level out of range: 0 - 9

The parameter passed to the fsetdebug() function was out of range.

C$DCISCHEM-003: FAILED: License not available

A cartridge function failed because of a license error. Check that the user has the daycart role granted to him, that the user has select privilege on the c$dcischem.license table, and that the license table has a valid cartridge license.

Invalid Arguments to cartridge functions

C$DCISCHEM-021: Not a valid QUERY: %.*s

The query string was not a valid SMARTS, SMILES, or fingerprint, as required.

C$DCISCHEM-022: Not a valid input SMILES: %.*s

The SMILES parameter was not parseable as a valid molecule or reaction.

C$DCISCHEM-023: Not a valid fingerprint: %.*s

The fingerprint parameter was not parseable as a valid molecule or reaction.

C$DCISCHEM-024: Incorrect predicate for index operator

The comparison operation is not supported for this index. For example, the predicate "contains(smiles, query) = 0" is not recognized by the index code. See the documentation for each indextype for the supported predicate forms.

Internal index errors

C$DCISCHEM-041: Unable to Get Session Handle

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-042: Unable to QUERY table

Possibly corrupt index. Try quitting the user session and restarting. Also, drop and rebuild the index. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-043: Unable to Insert Row, ROWID = %s

Possibly corrupt index. Try quitting the user session and restarting. Also, drop and rebuild the index. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-044: Unable to Delete Row

Possibly corrupt index. Try quitting the user session and restarting. Also, drop and rebuild the index. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-045: Refresh number corrupt

Possibly corrupt index. Try quitting the user session and restarting. Also, drop and rebuild the index. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-046: Unable to Convert Start parameter

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-047: Unable to Convert Stop parameter

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-048: Cannot Generate KEY

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-049: Cannot Set Context Value

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-050: Cannot Assign value to Self

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-051: Unable to get KEY value

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-052: Unable to get max number of rows for fetch

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-053: Unable to append DUMMY rowid

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-054: Unable to append rowid

Internal error. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-055: Query of base table failed

Possibly corrupt index or base table. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-056: Fetch of base table failed

Possibly corrupt index or base table. Try quitting the user session and restarting. Report this problem to Daylight if it reoccurs.

C$DCISCHEM-057: Access of blob data failed

Possibly corrupt index. Try quitting the user session and restarting. Also, drop and rebuild the index. Report this problem to Daylight if it reoccurs.

Program object errors

C$DCISCHEM-071: Couldn't stat() program object

The program object was not found. Check that the value of "path" in the table c$dcischem.progob matches the absolute path to the desired program object.

C$DCISCHEM-072: Couldn't stat() directory

The directory which supposedly contains the program object was not found. Check that the value of "path" in the table c$dcischem.progob matches the absolute path to the desired program object.

C$DCISCHEM-073: Program object is not executable

The program object does not have executable privilege set.

C$DCISCHEM-074: Program object directory must be owned by Oracle

For security reasons, the program object must be owned by Oracle or be SUID. Check the file permissions for the program object.

C$DCISCHEM-075: Program object directory group or world writable

For security reasons, the directory containing the program object must not be group or world writable. Check the file permissions for the directory.

C$DCISCHEM-076: Access of c$dcischem.progob table failed

The table c$dcischem.progob was not readable by the current user. The user must have select privilege on the table. Check that the role daycart has been granted to the user.

C$DCISCHEM-077: Program object not found in table c$dcischem.progob

The symbolic name for the program object (the first argument to fprogob()) was not found in the c$dcischem.progob table. Every program object requires a row in the c$dcischem.progob table specifying the absolute path to the program object and any arguments to the program object. The "name" parameter must match the first argument.

C$DCISCHEM-078: dt_alloc_program() failed

The program object didn't start. Verify that the desired program object runs properly from the command line before retrying. Also, make sure that any required environment variables are set by the program object itself.

C$DCISCHEM-079: dt_alloc_program() failed for restart

The program object crashed, and Oracle wasn't able to restart it. Try quitting the user session and restarting.

C$DCISCHEM-080: Progob failed dt_converse

The program object didn't respond properly to a message from Oracle. Verify that the desired program object runs properly from the command line before retrying.

Generic errors

C$DCISCHEM-101: CARTRIDGE ERROR

This is a generic error. Typically, there was an Oracle error during a cartridge function. The error stack should provide additional information.

C$DCISCHEM-102: CARTRIDGE WARNING

This is a generic warning. Typically, there was an Oracle warning during a cartridge function. The error stack should provide additional information.

C$DCISCHEM-103: CARTRIDGE INFO

This is a generic informational message. Typically, there was an Oracle informational message during a cartridge function. The error stack should provide additional information.

Tuning Hints:

This section contains notes on Cartridge Tuning. Currently they aren't particularly well organized.

INIT.ORA:

There are several INIT.ORA parameters which affect query performance for the cartridge. These primarily impact the ddblob index, which is the most resource-intensive of the four indextypes.

db_file_multiblock_read_count: The number of blocks read per I/O request in a sequential scan. Larger is better, up to a value of 32.

db_block_buffers: Number of buffers in the buffer cache. The blob index uses the buffer cache, so if the buffer cache is larger than the blob index, then no physical disk I/O is required for a search. The size of the blob data can be obtained for an index by examining the file '/tmp/extproc.log' for a search at the debug_level of 9 (select c$dcischem.ddpackage.fsetdebug(9) from dual;). An entry of the form "Data LOB size for search: 9726184" gives the LOB size in bytes.

log_buffer: The size of the redo log buffer. Index creation is a long transaction which benefits from a large log_buffer. Also, see the section on logfile sizes.

LOB caching:

The default behavior of the ddblob indextype is to enable LOB caching for all ddblob indexes created in the system. This means that the Oracle Server will attempt to cache BLOB data in memory like normal tabular data. Provided that the SGA is sufficiently large (db_block_buffers), searches over ddblob indexes will not require any physical disk I/O.

The downside of LOB caching is that BLOB data will displace other tabular data from the buffer cache during a search. On a system with small memory availability, this can cause all queries to slow down substantially as the BLOB data can fill the buffer cache for every search. In this scenario, it is preferred to disable BLOB caching on a per-index basis.

Each ddblob index has an auxillary table associated with it. That auxillary table has two BLOB columns. The following SQL commands will alter the CACHE status for the auxillary table:

  SQL> create index test_index on table(smiles) indextype is c$dcischem.ddblob;

  Index created.

  -- the auxillary table is named <index_name> || '_DAYIDX'

  SQL> desc test_index_dayidx;

  Name                   Null?    Type
 ---------------------- -------- --------------
  HASH                           BLOB
  DAT                            BLOB

  SQL> alter table test_index_dayidx modify lob (dat) (nocache);

  Table altered.

One can turn caching back on with "alter ... modify lob (dat) (cache)". Note that the HASH BLOB is used for inserts, deletes, and updates. It is small, compared to the DAT BLOB, so should remain cached for best overall performance. The DAT BLOB is larger and is used for searching, so select its cache behavior based on overall throughtput and tuning requirements.

LOGFILES:

Index creation time for the ddblob index is sensitive to logfile availability as it does a lot of I/O which needs to be logged. I added a total of four logfile groups to minimize checkpoint blocking. If you see entries in the instance logfile admin/<sid>/bdump/alert_<sid>.log like:

    Current log# 1 seq# 15 mem# 0: /oracle/db/oradata/dev/redo02.log
  Thread 1 cannot allocate new log, sequence 16
  Checkpoint not complete
    Current log# 1 seq# 15 mem# 0: /oracle/db/oradata/dev/redo02.log

during index creation, then increasing the number and size of your redo logs will improve index creation and modification performance substantially.

  SQL> select * from v$logfile;

      GROUP# STATUS    MEMBER
  ---------- ------- ---------------------------------
           1         /oracle/db/oradata/dev/redo01.log
           2         /oracle/db/oradata/dev/redo02.log

Something like the following sequence will work. This will give you a total of 16 MB of log file space spread over four separate

  -- Create two new logfiles

  SQL> alter database add logfile group 3
         ('/oracle/db/oradata/dev/redo03.log') size 4000k;
  SQL> alter database add logfile group 4
         ('/oracle/db/oradata/dev/redo04.log') size 4000k;

  -- Advance the current log file into the new ones

  SQL> alter system switch logfile;
  SQL> alter system switch logfile;

  -- Get rid of the original two logfiles

  SQL> alter database drop logfile group 1;
  SQL> alter database drop logfile group 2;

  -- Remove the files redo01.log and redo02.log 
     from the system and re-create them.

  SQL> quit
  $ rm /oracle/db/oradata/dev/redo01.log
  $ rm /oracle/db/oradata/dev/redo02.log

  $ sqlplus sys/pw

  SQL> alter database add logfile group 1
         ('/oracle/db/oradata/dev/redo01.log') size 4000k;
  SQL> alter database add logfile group 2
         ('/oracle/db/oradata/dev/redo02.log') size 4000k;

  SQL> select * from v$logfile;

      GROUP# STATUS    MEMBER
  ---------- ------- ---------------------------------
           1         /oracle/db/oradata/dev/redo01.log
           2         /oracle/db/oradata/dev/redo02.log
           3         /oracle/db/oradata/dev/redo03.log
           4         /oracle/db/oradata/dev/redo04.log