•Starting with each atom, traverse all paths, branches,
and ring-closures up to a certain
depth (typically 8). For each substructure, derive a hash-like number from unique, relatively-prime,
of each atom and bond type. Critical properties
of this number are that it is reproducible (each substructure produces a single number) and its value and
graph are not correlated (a
linear congruential generator is used to insure this).
•Map each resulting number into a large range (typically
2K-64K) to produce a redundant,
large-scale, binary representation of the substructural elements. The resultant
"fingerprint" contains a large amount of information at a low density.
•Iteratively "fold" the fingerprint by OR-ing
the fingerprint in half until the
bit-density reaches a minimum required value or until the fingerprint reaches a minimum allowable length. The
resulting fingerprint now has a
high information density with a minimal (and controllable) information loss.