Efficiently finding similar proteins (via sequence similarity) requires specialized algorithms which work at the sequence (string) level. However, there are a number of interesting parallels between the methods for small molecule substructure searching and sequence similarity searching:
Substructure searching | Sequence similarity | |
---|---|---|
Topology | 2D graphs | 1D strings |
Screening | Fingerprints | Local identities |
Similarity measure | Tanimoto, etc | Scoring Matrices |
Matching | Graph matching | Dynamic programming |