SMARTS Tutorial

Table of Contents

1. Introduction
2. Properties of Atoms
3. Bonds
4. Logical Operators
5. Recursive SMARTS
6. Component-Level Grouping
7. Reaction SMARTS

1. Introduction:

SMARTS...

...means SMiles ARbitrary Target Specification
...is a language used for describing molecular patterns and properties
...rules are straightforward extensions of SMILES
       - All SMILES symbols and properties are legal in SMARTS.
       - SMARTS includes logical operators and additional molecular descriptors
...can describe structural patterns with varying degrees of specificity and generality:
       - SMILES for methane:     C or [CH4]
       - High specificity SMARTS describing a pattern consistent with methane:     [CH4]
          Only matches aliphatic carbon atoms that have 4 hydrogens.
          Won't match ethane, ethene, or cyclopentane.
       - Low specificity SMARTS describing a pattern consistent with methane:     C
          Matches aliphatic carbon atoms that have any number of hydrogens.
          Will match ethane, ethene, and cyclopentane.

2. Properties of Atoms

SMARTS Hits SMILES: Note

[+1]  

Atoms that have a plus one charge
All SMILES atomic properties are valid in SMARTS; this includes charge, hydrogen count, isotopic specifications, bond symbols, and chirality specification. + is +1, ++ is +2, etc.  

[a]  

Atoms that are aromatic
"a" is any aromatic atom.  

[A]  

Atoms that are aliphatic
"A" is any aliphatic atom.  

[#6]  

Atoms that have an atomic number of 6 (c or C)
"#<number>" defines an atom that has an atomic number of <number>. Hits both aliphatic and aromatic atoms.

[R2]  

Atoms that are in 2 rings
"R<number>"   defines an atom that is in <number> rings. Default (R) is any ring atom.

[r5]  

Atoms that are in a ring that has 5 members
"r<number>"   defines an atom that is in a ring that has <number> members. Default (r) is any ring atom.

[v4]  

Atoms that are four-valent
"v<number>"   defines an atom that has <number> bonds. Total bond order (= is 2 bonds, # is 3)

[X2]  

Atoms that are connected to two other atoms
"X<number>" defines an atom that is connected to <number> other atoms (including all hydrogens)  

[H]  

Hydrogen Atoms
A hydrogen atom (often called an "explicit hydrogen") has special properties ([H+],[2H], [H][H] etc). [H+] and [2H] behave similarly.  

[H1]

Atoms that have one attached hydrogen.
" H<number>" defines an atom that has <number> attached hydrogens ("implicit" or "explicit", i.e. H property or H atom ). Default, [*H], is 1 for a non-hydrogen atom.

*  

Any Atom
In SMARTS, the wildcard atom ,"*", matches all atoms. It won't hit hydrogens which are merely properties of heavy atoms. 

3. Bonds

SMARTS Hits SMILES: Note

CC  

Molecules where an aliphatic carbon is SINGLE BONDED to another aliphatic carbon
All SMILES bond properties are valid in SMARTS; this includes implicit single bonds, explicit single bonds (-), double bonds (=), triple bonds (#), and aromatic bonds (:). WON'T match double bonds or triple bonds (includes C=C and C#C ...).  

[#6]~[#6]  

Molecules where two carbons are connected by any bond (includes single bonds, double bonds, triple bonds, and aromatic bonds)
"~" means any bond (wildcard bond).  

[#6]@[#6]

Molecules where two carbons are connected by a ring bond
"@" is a bond between two atoms that are within the same ring.  

F/?[#6]=C/Cl  

Molecules where a carbon (which is connected to a fluorine by a directional "up or unspecified" bond) is connected to another carbon (which is connected by an "up" bond to a chlorine) (e.g. F/C=C/Cl and FC=C/Cl ). This excludes molecules where a carbon (which is connected to a fluorine by a "down" bond) is connected to another carbon (which is connected to a chlorine by an "up" bond)
"?" means "OR unspecified". "?" may also be used with chirality specification (@ and @@).  

4. Logical Operators

SMARTS Hits SMILES: Note

[!c]  

Atoms that are NOT aromatic carbons
"!" means "not".  

[N,#8]  

Atoms that are an aliphatic Nitrogen OR an Oxygen (aromatic or aliphatic)
"," means OR. OR is higher precedence than low precedence "and"(;), but lower precedence than high precedence "and" (&).  

[#7,C&+O,+1]
or
  [#7,C+O,+1]

Atoms that (are Nitrogens) or (are neutral aliphatic Carbons) or (are positively charged)
"&" is "and" (high precedence). High precedence "and" is the default logical operator and may be omitted.  

[#7,C;+0,+1]  

Atoms that (are Nitrogens or are aliphatic Carbons) and (are neutral or positively charged)
";" is "and" (low precedence).  

5. Recursive SMARTS

SMARTS Hits SMILES: Note

[$(*O);$(*CC)]

Atoms that are in an environment where (the atom is connected to an aliphatic oxygen) and where (the atom is connected to two sequential aliphatic carbons)
Any SMARTS expression may be used to define an atomic environment by writing a SMARTS starting with the atom of interest in this form: $(<SMARTS>)  

[$([CX3]=[OX1]),
$([CX3+]-[OX1-])]

Atoms that are within molecules which contain a Carbonyl group (either resonance structure)
 

[$([#6]aaO);$([#6]aaaN)]  

Aliphatic carbon that is ortho to an O and meta to an N
 

6. Component-Level Grouping

SMARTS Hits SMILES: Note

[#8].[#8]  

Molecules that contain two oxygens ( e.g. O=O, OCCO and O.CCO)
"." (dot) in SMARTS means "not necessarily connected".  

([#8].[#8])  

Molecules that contain two oxygens that are within the same component ( e.g. O=O and OCCO but NOT O.CCO)
A single set of parentheses may surround any legal SMARTS expression. Here parenthesis indicate that the contents are within the same component of the target SMILES.  

([#8]).([#8])  

Molecules or mixtures that contain two oxygens that are within different components ( e.g. O.CCO but NOT O=O or OCCO)
Separate Component-Level Groupings may be specified. Here parenthesis indicate that the respective contents are within different components of the target SMILES.  

7. Reaction SMARTS

SMARTS Hits SMILES: Note

[#6]=,:[#6]  

Carbons connected by a (double or aromatic) bond.
Molecule SMARTS (SMARTS without ">" characters) can match anywhere in a Reaction SMILES target (reactant, agent, or product).  

>>[#6]=,:[#6]  

Product Carbons connected by a (double or aromatic) bond.
Reaction SMARTS (SMARTS with ">" characters) never match molecule targets.  

[C:1]>>[C:1]  

Mapped reacting carbons.
Mapped SMARTS atomic queries never match unmapped target atoms. Mapped SMARTS reaction queries never hit unmapped reaction targets.  

[C:1]>>C  

Reacting carbons.
Unpaired maps in the query are ignored.  

[C:1][C:1]>>[C:1]  

Multiple mapped reacting carbons.
SMARTS map classes inter-relate reactants to products but don't intra-relate reactants or products. (Although query reactants have the same class, they can match target reactants of different classes.)  

More Information

Theory Manual
SMARTS Examples
SMARTS Practice