Daylight Summer School 1999, June 15-17, St. John's College, Santa Fe, NM

Daylight Worksheet - Designing a database -- WITH HINTS

Designing a database generally amounts to specifying the datatypes. Datatypes may be identifiers or non-identifier data, there may be one or several datafields, fields may be indirect or not, they may be numeric or text or ascii-encoded-binary, or chemical - a SMILES.

So this exercise consists of designing a datatype and incorporating it into our test database. This does not encompass all the issues of database design, but is a typical and illustrative task.

  1. In this case, we'll start with some new data which we have in TDT format. The data are for solvents. Here is a sample:


    We see three fields, the SMILES for the solvent, the name of the solvent, and the solubility. Your task is to compose a datatype TDT which defines this datatype in a reasonable way. The SMILES should be recognized as such, the name should be normalized for reliable searching, and the solubility should be recognized as a real number.

    Look at examples in test_datatypes.tdt. Write a file containing your one datatype, sol_dtype.tdt.

    Here's one possibility...

    _V<"Solvent; Solvent/Ref">
    _S<Solvent in SMILES notation>
    _O<Test datatype>

  2. Thorload your datatype into the datatypes database:

    thorload \
       test_datatypes < sol_dtype.tdt

  3. Now let's load the data. The file sols.tdt contains TDTs of solvent data rooted by their associated SMILES. By loading in merge mode, these data will be added to the appropriate datatrees.

    % thorload test < sols.tdt

  4. If load errors occur, it may be due to a faulty datatype. If this is the case, fix it and reload. Once loaded with no errors, verify your result by examining datatrees with xvthor and xvmerlin.

Daylight Chemical Information Systems Inc.