Fingerprints are bit arrays (aka "bitmaps"), for high-speed structural screening and similarity comparison.
OC=CNwould generate the following patterns:
fingerprint [-b minbits] [-c maxsize] [-d dens] [-id fpid] [-t TAG] [-x] [-z] [-s minstep/maxstep] [-m [-mb minbits] [-mt TAG] [-md dens] [-mz]] [ in.tdt [ out.tdt ] ] in.tdt ....... .tdt file contining $SMI data (default: stdin) out.tdt ...... .tdt file with $FPG and FP data added (default: stdout) standard options: -b minbits .. minimum fingerprint size allowed, bits (default: 64) -c maxbits .. creation size of fingerprint, bits (default: 2048) -d dens ..... density below which fingerprints are folded (default: 0.3) -id fpid .... identify this run by `fpid' -t TAG ...... use `TAG' instead of `FP' for fingerprint dataitems -x .......... generate difference fingerprints (XFP<>) -z .......... zap existing FP and $FPG data -s min/max .. Compute bits for pathlength in this range (default: 0/7) options for mixtures: -m .......... generate fingerprints for mixture components ("parts") -mb minbits . minimum fingerprint size allowed, bits (default: 64) -mt TAG ..... use 'TAG' instead of `FPP' for mixture fingerprints -md dens .... density below which fingerprints are folded (default: 0.3) -mz ......... zap existing FPP data from TDT stream produces: FP<fp;obits;oset;nbits;nset;ver;fpid> FPP<part-ntuple;fpid>
The paths generated for the molecules would be as follows:
|Enumerated Fingerprint Paths:|
|Path Length:||Reactant (count/path):||Product (count/path):|
|0||1 I, 1 Na, 3 C, 1 Br||1 I, 1 Na, 3 C, 1 Br|
|1||1 C=C, 1 C-C, 1 C-Br||1 C=C, 1 C-C, 1 C-I|
|2||1 C=C-C, 1 C-C-Br||1 C=C-C, 1 C-C-I|
|3||1 C=C-C-Br||1 C=C-C-I|
|Path Length:||Difference (count/path):|
|0||0 I, 0 Na, 0 C, 0 Br|
|1||0 C=C, 0 C-C, 1 C-Br, 1 C-I|
|2||0 C=C-C, 1 C-C-Br, 1 C-C-I|
|3||1 C=C-C-Br, 1 C=C-C-I|
After generating the difference in counts, we only use the six paths with non-zero differences to set bits in the difference fingerprint. These are the paths which walk through bonds that change during the reaction. By considering only these paths, we get a fingerprint which reflects the overall bond changes in the reaction.
_V<"Component fingerprints;Component fingerprints/ID">
_O<Daylight Chemical Information Systems Inc.>
|bits(F)||A function that returns the number of "1" bits in a bitmap|
|BT||The total number of bits (the fingerprint's size); a constant|
|B1||bits(F1)||The number of 1's in F1|
|B2||bits(F2)||The number of 1's in F2|
|BC||bits( F1 AND F2 )||The number of 1's in common between F1 and F2|
|BI||bits(F1 XOR (NOT F2))||The number of identical bits (1's and 0's) between F1 and F2|
|BU1||bits(F1 AND (NOT F2))||The number of unique bits (1's) in F1|
|BU2||bits(F2 AND (NOT F1))||The number of unique bits (1's) in F2|
The distance-as-substructure metric is:
Tversky similariy compares features in a given structure (the "prototype") to features in database structures (as "variants") with user specified weighting for each set of features.
|TS = BC / ( BU1 + BU2 + BC)|
Example: Setting the weighting of prototype features to 100% and variant features to 100%, i.e.=1,=1, produces a symmetrical similarity metric identical to the Tanimoto metric.
Example: Setting the weighting of prototype and variant features asymmetrically produces a similarity metric in a more-substructural or more-superstructural sense. Setting the weighting of prototype features to 100% (=1) and variant features to 0% (=0) means that only the prototype features are important, i.e., this produces a "superstucture-likeness" metric. In this case, a Tversky similarity value of 1.0 means that all prototype features are represented in the variant, 0.0 that none are.
Example: Setting the weights to 0% prototype (=0) / 100% variant (=1) produces a "substucture-likeness" metric, where completely embedded structures have a 1.0 value and "near-substructures" have values near 1.0.
Tversky metrics where the two weightings add up to 100% (1.0) are of special interest (e.g., the 50/50 metric is known as the Dice index).