Daylight Summer School 2001, June 5-7, Santa Fe, NM

Daylight Worksheet - Building a database with scripts -- WITH HINTS

By using scripts to build a database, you can automate the process for greater ease and repeatibility. The script is thus both a tool and a record of the steps taken.

  1. Locate the script MakeDB in your admin directory. This is a sample. MakeDB includes a typical set of steps for converting a database from a non-Daylight format, to creating the database and loading the data. The fingerprint program is used to add fingerprints, necessary for Merlin searching. The clogp program is used to add data. Inspect the script and study how these tasks are accomplished.

    DestroyDB simply destroys the database and its auxilliary databases safely and conveniently.

  2. Copy MakeDB to MakeDB-original and then modify MakeDB to double the primary hash table size and cross-reference hash table size. Also add a command to thorload a second data file generated from test2.mol.
    thormake \
            -DATATYPES_DATABASE $DBNAME_datatypes \
            -INDIRECT_DATABASE $DBNAME_indirect \
            \$DY_THORDB/$DBNAME%@$HOST::$USER% \
            2000 4000
    ...should become...

    thormake \
            -DATATYPES_DATABASE $DBNAME_datatypes \
            -INDIRECT_DATABASE $DBNAME_indirect \
            \$DY_THORDB/$DBNAME%@$HOST::$USER% \
            4000 8000 incorporate test2.mol, add...

    cat RAW/test2.mol \
     | mol2smi -OUTPUT_FORMAT TDT > test2.tdt
    ...and change...

    cat $DBNAME.tdt \
      | fingerprint > TMP/$DBNAME.fp.tdt

    cat $DBNAME.tdt test2.mol \
      | fingerprint > TMP/$DBNAME.fp.tdt

  3. (OPTIONAL) Other possible enhancements to the script:
    1. Cluster the compounds using the Clustering Package (nearneighbors, etc.); add this cluster data to the database.

      nearneighbors -NUM_PROCESSES 2 \
                    -RECORD_COUNT 2000 \
                    TMP/$DBNAME.fp.tdt > TMP/$DBNAME.nn.tdt
      jpscan -RECORD_COUNT 2000 TMP/$DBNAME.nn.tdt >
      jarpat -RECORD_COUNT 2000 TMP/$DBNAME.nn.tdt > TMP/$
      listclusters -v -m 2000 TMP/$ > TMP/$DBNAME.lclus.tdt

    2. Add CMR data as well as ClogP data.

      cat TMP/$DBNAME.fp.tdt \
        | clogp | cmr > TMP/$DBNAME.fp.pcm.tdt

    3. Add molecular formulae using the contrib program addf.c.

      cat TMP/$DBNAME.fp.pcm.tdt \
        | addf > TMP/$

  4. Run your edited script and verify that no errors occur and the database is useable with Thor and Merlin.

    % MakeDB
    % xvmerlin

Daylight Chemical Information Systems Inc.