10. Fingerprint Toolkit

Back to Table of Contents

10.1 Introduction

Fingerprints, their uses and the history of their development in the Daylight Toolkit (tm) are described in detail in the Daylight Theory Manual, chapter on Fingerprints.

The Daylight Fingerprint Toolkit provides a set of tools for rapidly screening very large databases of chemical structures for substructure searching, and for computing the structural similarity between molecules.

Those who have seen or used Daylight's Merlin program will immediately recognize that the Fingerprint Toolkit is part of the foundation of Merlin. However, it should be noted that Merlin has many more capabilities than just the functionality available via the Fingerprint Toolkit; fingerprinting is only a base on which a much larger set of capabilities is built.

The Fingerprint Toolkit, unlike other Daylight Toolkits, is not recommended for most programming projects. It is intended for a few special situations where customers have existing database-searching capabilities and wish to enhance performance or add similarity metrics. If you are contemplating building a chemical information system, we strongly recommend that you consider using Merlin and THOR rather than starting with the Fingerprint Toolkit.

10.2 Fingerprint Functions

The Daylight Fingerprint Toolkit uses Fingerprint Objects to represent fingerprints. Fingerprints have the following properties:

Fingerprint Properties
bitmap the fingerprint itself
number of bits the number of bits in the fingerprint (its length in bits)
orig number of bits the number of bits in the original fingerprint (before folding)
number of bits set the number of 1's in the fingerprint's bitmap
orig num. bits set the number of 1's in the original fingerprint's bitmap (before folding)
version the version of the Daylight Toolkit used to create the fingerprint

10.2.1 Global Settings

In versions prior to 4.42, there were three global toolkit values which controlled fingerprint creation size and folding. These are no longer needed.

10.2.2 Creating Fingerprints

There are two functions to create fingerprints. You can allocate a "blank" fingerprint, then fill it in later with data from an external source (see Fingerprint Bit Operations, below), or you can create a fingerprint directly from a molecule.

dt_fp_allocfp(integer size) => Handle fingerprint
Allocate an empty fingerprint. The fingerprint's size will be the given "size" value. It is an error to apply any function that returns a property of the fingerprint unless that property has been explicitly set.

dt_fp_generatefp(Handle ob, integer minstep, integer maxstep, integer size) => Handle fingerprint
Allocate a fingerprint object of the given size; fill its fields with a fingerprint generated from the object ob, then set the objects "original size", "original bits set", "size" and "bits set" properties (see dt_fp_obitcount(), dt_fp_obits(), dt_fp_bitcount() and dt_fp_nbits()).

The object ob can be any object for which dt_stream(ob, TYP_ATOM) and dt_stream(ob, TYP_BOND) will return a stream of atoms and bonds, respectively. Typically ob is a molecule object, but one can fingerprint various substructures using path, pathset, substructure, cycle, atom, or bond objects. For example, one can produce a "ring- system fingerprint" using a substructure object that contains all of the atoms and bonds in all cycles of a molecule.

The parameters "minstep" and "maxstep" control the fingerprint generation. "minstep" sets the minimum-length path to be included in the fingerprint; "maxstep" sets the maximum-length path included.

dt_fp_partfp(Handle part, Handle ob, integer minstep, integer maxstep, integer size) => Handle fingerprint
Like dt_fp_generatefp(3), except only sets the fingerprint for paths which include the object 'part', which may be an atom or bond. This function performs the full path enumeration over 'ob', but only sets bits in the resulting fingerprint for paths containing 'part'.

This function is a supported version of the function previously included in the contrib/stigmata directory. Note that the results using this function will be slightly different, because this version correctly includes branch and cycle paths containing the object 'part'. The contributed version only considered the straight-chain paths containing 'part'.

10.2.3 Properties

dt_fp_nbits(Handle fp) => integer nbits
Return the fingerprint's size (number of bits in the bitmap).

dt_fp_obits(Handle fp) => integer obits
Return the fingerprint's original size (before folding). This is the value of "size" which was provided when the fingerprint was created (see above).

dt_fp_bitcount(Handle fp) => integer bitcount
Return the number of 1's (bits set) in the fingerprint's bitmap.

dt_fp_obitcount(Handle fp) => integer obitcount
Return the original bitcount (before folding).

dt_fp_setobitcount(Handle fp, integer obc) => boolean ok
Set the original bitcount. This is intended to be used only with fingerprint objects created via dt_fp_allocfp() and filled manually.

dt_fp_setobits(Handle fp, integer ob) => boolean ok
Set the original number of bits. As with dt_fp_setobitcount(), only for use with fingerprint objects created via dt_fp_allocfp().

10.2.4 Fingerprint Bit Operations

These operations allow the user to manipulate the individual bit-values of the fingerprint. They are useful for creation of custom fingerprints (eg. bitscreens, 3-D or spectral fingerprints), combining multiple fingerprints, or writing specialized comparison functions.

Note that dt_stringvalue() and dt_setstringvalue() can be used to get and set the entire binary value of a fingerprint. The functions described here are useful for manipulating individual or ranges of bits within a fingerprint.

dt_fp_bitvalue(Handle fp, integer bitno) => integer value
Returns the current value of a bit in the fingerprint.

dt_fp_setbitvalue(Handle fp, integer bitno, integer value) => boolean ok
Sets the current value of a bit in the fingerprint to "value".

dt_fp_range(Handle fp, integer offset, integer nbits, integer *soffset) => string range
Gets a range of bits from a fingerprint. The range is returned as a string of binary data, starting "soffset" bits from the beginning of the string. The range of bits requested begins at "offset", for "nbits" bits, or to the end of the fingerprint.

dt_fp_setrange(Handle fp, integer offset, integer nbits, integer slen, string string, integer soffset, integer operation) => boolean ok
Sets the values of a range of bits in the fingerprint. The range of bits set are given by "offset" for "nbits" bits. The "operation" and the given string, starting "soffset" bits from the beginning of the string, controls how the bits are set. Legal operations are:

DX_FP_SET sets each bit in the range to the source (string) value.
DX_FP_NOT sets each bit in the range to the inverse of source (string) value.
DX_FP_OR sets each bit in the range to the logical-"or" of the source (string) value and current bit value.
DX_FP_AND sets each bit in the range to the logical-"and" of the source (string) value and current bit value.
DX_FP_XOR sets each bit in the range to the logical-"xor" of the source (string) value and current bit value.

10.2.5 Comparisons

dt_fp_fingertest(Handle patfp, Handle molfp) => boolean sub
Returns TRUE if all of the bits that are set (1) in patfp are also set in molfp; that is, returns the value of the logical expression
     patfp == (patfp AND molfp).
In other words, return TRUE if the molecule that generated patfp could be a substructure of the molecule that generated molfp.

Returns FALSE if any set bit in patfp is not also set in molfp, or if the two fingerprints are not compatible (e.g. different sizes), or if either object is not a fingerprint.

dt_fp_euclid(Handle fp1, Handle fp2) => float dist
Returns the euclidian distance between fp1 and fp2, or -1.0 if an error is detected (i.e. the fingerprints are not compatible or are not fingerprint objects).

dt_fp_tanimoto(Handle fp1, fp2) => float tan_coeff
Returns the Tanamoto coefficient between fp1 and fp2, or -1.0 if an error is detected (i.e. the fingerprints are not compatible or are not fingerprint objects).

dt_fp_foldfp(Handle fp, integer minsize, float mindensity) => boolean ok
Fold the fingerprint. Returns TRUE if no errors are detected. Note that folding may not actually occur. Folds zero or more times until the "minsize" or "mindensity" values are reached.
Back to Table of Contents
Go to previous chapter SMARTS Toolkit
Go to next chapter Depict Toolkit.