Daylight v4.9
Release Date: 1 February 2008


thorlookup - look up specific TDTs in a Thor database

Unix Synopsis

thorlookup [options] database [[infile] outfile]


thorlookup retrieves Thor Datatrees (TDTs) from a Thor database (via a Thor server) and writes them to outfile.

The TDTs to be retrieved are specified by infile; if it is not specified, standard input is used. The retrieved TDTs are written to outfile; if it is not specified, standard output is used.

If the input is a SMILES file, each line of the input file is taken to be a SMILES string; the TDT with that SMILES as its root is retrieved. If the input file it is a TDT file, TDTs are read from the input file and the root identifier of each TDT is extracted; the TDT with that root identifier and datatype is retrieved. Note that TDTs that don't have a SMILES as their root (i.e. those for which no structure is known) can only be retrieved using the TDT input format. Also, for a TDT input file, note that ONLY the root identifier is of any significance; all other data in the input TDTs are ignored.

If the option INPUT_FORMAT is not specified (see below), the input file's format (TDT or SMILES) is determined from the file's suffix (.smi or .tdt, respectively) where possible. If the suffix is unrecognized, or if input is from standard input, the file is assumed to be a SMILES file.

Retrieval proceeds as follows. If the identifier is a SMILES, it is looked up directly; if found, its TDT is printed. If the identifier is not a SMILES, it is looked up as a cross-reference to find its SMILES; if a SMILES cross-reference is found, the SMILES is looked up and its TDT is printed. If there is no cross-reference SMILES, then the identifier is looked up as a non-SMILES-rooted datatree; if the non-SMILES-rooted TDT is found, it is printed. If there is more than one SMILES for a non-SMILES identifier, then the option RETRIEVE_ALL (described below) determines whether one or all are printed.

If the option OUTPUT_FORMAT is not specified, the output file's format is determined from the file's suffix (.smi, .tdt, or .fdt):

SMILES output format

Each TDT is retrieved, and its root SMILES is printed. This might be used for a "Name lookup" service; for example, a shell script might be written that, when a user enters a CAS number, prints the compound's name.
TDT output format
Each TDT is retrieved and printed in its lexical form (one line per dataitem). This form is useful for transmitting information between programs, but is not very readable.
Formatted Datatree format
Each TDT is retrieved and printed in a tab-delimited file. Each dataitem is printed on a separate line, preceded by tabs indicating its "depth" in the datatree. Tags (e.g. "PCN") are replaced by their verbose labels (e.g. "Local Name"). Fields within each dataitem are separated by tabs.


Specifies whether the input file is a SMILES file or a TDT file. See the discussion above. Default is SMI unless the input filename's suffix indicates otherwise.
Specifies whether the output is in SMILES, TDT, or "Formatted Datatree" form. See the discussion above. Default is TDT unless the output file name's suffix indicates otherwise.
These two options select which datatypes are to be included in the output. Each takes a list of datatypes tags (e.g. "$SMI $CAS PCN P") or the keyword "ALL", indicating that all datatypes are included. Tags may be separated in lists by space, bar (|), or comma (,). The option INCLUDE_DATATYPES is processed before option EXCLUDE_DATATYPES, i.e. starting with no datatypes selected, the INCLUDEd datatypes are added, then the EXCLUDEd datatypes are removed.

The default for INCLUDE_DATATYPES is "ALL", and for EXCLUDE_DATATYPES is "NONE". The default values result in selection of all datatypes. Ancestors of included dataitems are included regardless of datatype.

Tags in the tag lists must be exactly as per the internal tag, e.g., $CAS" for CAS Number (punctuation and case count).

If TRUE, then all TDTs for an "ambiguous" identifier are printed; if FALSE, only the first one (chosen arbitrarily) is printed. Default is FALSE.
If TRUE, indirect references are replace with their expansions in the output TDTs. If FALSE, the original unexpanded indirect references (i.e. the original data) are kept in the output TDTs. Default is TRUE.
If TRUE, the input identifier is not normalized before lookup. Raw lookup only works for root identifiers, not for cross-references. Raw lookup is also not compatible with FDT output or the INCLUDE_DATATYPES or EXCLUDE_DATATYPES options.


The following options are common to most or all "thorfilter" programs. They are described in more detail in thorfilters(1).


TRUE means don't allow passwords on the command line (require interactive entry). Default: TRUE.
Names the default TCP/IP service or "port" of the Thor server. Default: thor.

Return Value

Return status is zero if the lookup succeeds, or one if a problem is detected. Failure to find a TDT is not considered to be a problem.


thorlookup mydb@dbserver < input.smi
Opens the database "mydb" on the server "dbserver" with read permission; prints any TDT with a $SMI root whose SMILES occurs in the file "input.smi". Since output is to standard output, the output format defaults to "TDT".



Daylight License

programs: thor

Related Topics

dayevict(1) daymessage(1) merlindbping(1) merlinload(1) merlinls(1) merlinping(1) merlinwho(1) thorchange(1) thorcrunch(1) thordbping(1) thordelete(1) thordestroy(1) thordiff(1) thordump(1) thorlist(1) thorload(1) thorlookup(1) thorls(1) thormake(1) thorping(1) thorwho(1)

sthorman(1) thorserver(1) merlinserver(1) licensing(5)

Daylight Theory Manual, Daylight System Administration Manual


None known.