•By using the unique SMILES of a molecule as the
molecule's primary identifier (the
tdt's "main topic"), Thor is able to eliminate all searching during data retrieval. All data are looked up
directly in a hash table.
•Hashing begins with a hash function, h(K,N), which
takes a string of characters,
K, and converts it (via a pseudo-randomizing algorithm) into a number between zero and N-1.
•Using a hash function h(K,N), data records on the
computer's disk can be
accessed directly: The hash value is used to access a hash table, which contains the desired record's location in the data file. Except in the case of hash collisions , only two disk
accesses are required (one
if the hash table is cached ).