Some SMILES/SMARTS Searching Subtleties

In the Daylight system, SMILES and SMARTS searching refers to explicit and general structural searches over a dataset. Typically, these searches are performed with the Merlinserver via Merlin client program such as XVMerlin. "Searching" is distinguished from "lookup", a Thor function, which accesses one record (TDT) by exact specification of an identifier.

SMILES Searching

A "SMILES search" is a search across a dataset of structures for an exact substructure, specified by a SMILES. The Merlinserver utilizes Daylight "fingerprints" and in-memory techniques to optimize speed for SMILES searches. However, this methodology is irrelevant to the resulting hitlist, i.e., to the search logic. The fingerprints are used for a high-speed screen which discards impossibles in the dataset. An authoritative check is then performed on the possibles remaining.

It should be noted that while the SMILES language used for search targets is identical to that used for structure specification, the meaning of a search-SMILES is subtly different from that of a structure-SMILES. The biggest difference is that the implicit hydrogens of the search-SMILES are ignored. For example, the SMILES for cyclohexane, C1CCCCC1 will match any six aliphatic carbons in a ring

In order to fingerprint the search-SMILES, it must be interpreted as a molecule. Thus, "cOc" is not a valid search-SMILES (though it is a valid SMARTS).

SMARTS Searching

A "SMARTS search" is a search across a dataset of structures (specified by their SMILES) for a structural pattern, specified by a SMARTS. A SMARTS may be less restrictive or more restrictive than a SMILES, for example "[#6]" means any carbon, and "[C,N,O]" means either a carbon, nitrogen, or oxygen, whereas "[C;H0]" means an aliphatic carbon with no hydrogens attached.

SMARTS searches are generally slower than SMILES searches. However, as of release 4.4, fingerprints-screening is used to some extent, basically, to screen based on the explicit portion of the SMARTS.

Back to

Support/FAQ Page