The TDT datafield quoting convention
If a datafield contains characters which are used in the Thor
datatree syntax ( $ < ; > | " ), the
whole field must be enclosed in doublequotes
Doublequotes in data are indicated by doubling the doublequotes,
e.g., the dataitem for name 4',4"-PCB would be appear as:
The quoting convention for TDT datafields hasn't changed --
but in version 4.51, it's uniformly enforced.
Previous versions of the software made exceptions, e.g.,
for datatype definitions.
The syntax and of all datatrees is now unified;
no exceptional cases remain.
What hasn't changed?
This change shouldn't affect toolkit programs, since the the lexical
(external) form of datatrees is not visible at the object (toolkit) level.
Unquoted datafield values are set via
and obtained via
Conversion between (quoted) lexical datatrees and thor datatree objects
is done via
Datatype definitions of identifiers
The "datatype definitions" ($D datatype) of identifiers always
require quoting now because the tag value contains '$',
e.g., the Spresi Reaction Registry Number, $SRNO:
_V<SPRESI Reaction Registry Number>
_B<SPRESI Reaction ID>
_M<Name, Lookup, Common, System>
Datatype definitions of multi-field datatypes
Datatype definitions of multi-field identifiers also require quoting
because the data values contain ';',
e.g., the Fingerprint definition, FP:
_V<"Fingerprint;Orig size;Obits on;Size;Bits on;Type;Run ID">
Some SMILES data need to be quoted now, since reactions contain
the reserved `>' character.
CIT<STEREOCHEMISTRY OF THE DIELS-ALDER REACTION OF BUTADIENE WITH CYCLOPROPENE;
BALDWIN JOHN E., REDDY V. PRAKASH;;J. ORG. CHEM., 54,(1989) N2, C. 5264-5267;
Although only datafields containing reserved characters need to
be quoted, there is no harm in quoting all datafields on input.
The easiest way to "fix up" an old datatypes tdt file is to quote all
fields, e.g., the two datatrees below are synonymous:
_V<Ave molecular weight>
_S<Average molecular weight>
_D<Average natural molecular weight>
_M<System, Medchem, Calculated>
_V<"Ave molecular weight">
_S<"Average molecular weight">
_D<"Average natural molecular weight">
_M<"System, Medchem, Calculated">
A practical consequence of the above-described change is that all extant
Thor databases must be reloaded. Although it is not difficult to do so
with the tools supplied in the Daylight distribution, the program
thordbfix451 is supplied to to the job. This is a simple and robust
shell script which "leads you by the hand" through the process.
It's use is strongly recommended.
All databases supplied by Daylight with v4.51 (dated 1997 or later)
are already in the updated form (of course!)
and do not need to be thordbfix-ed.
It is sometimes difficult to tell which databases have be created and
loaded by a particular version of the Daylight software
(e.g., when you are halfway through converting your databases).
As of version 4.51, the database .THOR file contains a "version" line.
To find out what software version created a database, check the .THOR
file. If it contains a line like "version: 4.51", you've got it.
If not, it was created by version 4.42 software (or earlier).
Daylight Chemical Information Systems, Inc.