SMILES Tutorial: Conventions
This document is intended to be viewed with a tables-capable browser.
Hydrogen atoms do not normally need to be specified
when writing SMILES for most organic structures.
The presence of hydrogens may be specified in three ways:
- for atoms specified without brackets, from normal valence assumptions.
- Explicitly by count
- inside brackets, by the hydrogen count supplied; zero if unspecified.
- As explicit atoms
- i.e., as explicit [H] atoms.
There is no distinction between "organic" and "inorganic" SMILES nomenclature.
One may specify the number of attached hydrogens for any atom in any SMILES.
For example, ethane be written as
CC or [CH3][CH3] or [H]C([H])([H])C([H])([H])[H].
There are four situations where specification of explicit hydrogen
specification is required:
Most of the confusion in using SMILES arises from the SMILES definition
of aromaticity. That's a shame, because in virtually all cases,
one can simply (and safely) ignore aromaticity.
- charged hydrogen, i.e. a proton, [H+]
- hydrogens connected to other hydrogens, e.g., molecular hydrogen, [H][H]
- hydrogens connected to other than one other atom,
e.g., bridging hydrogens
- isotopic hydrogen specifications, e.g. in heavy water, [2H]O[2H]
When should I specify a structure as aromatic?
You never need to do so.
If you find yourself typing in SMILES, it's a bit easier to type
"c1ccccc1" for benzene instead of "C1=CC=CC=C1" cyclohexatriene,
but it's just a matter of convenience,
since they mean exactly the same thing.
What does "aromatic" mean, anyway?
"Aromatic" means "it smells nice".
No kidding, that's the only defensible definition.
There is no single rigorous definition of aromaticity in chemistry.
To a synthetic chemist, aromaticity implies something about reactivity;
to a thermodynamicist, about heat of formation;
to a spectroscopist, about NMR ring current;
to a molecular modeler, about geometrical planarity;
to a cosmetic chemist, it probably means "smells nice".
The SMILES definition of aromaticity has nothing to do with the other
definitions, except that we'd all agree that benzene is "aromatic".
Why does SMILES provide an "aromatic" concept at all?
The SMILES language was specifically designed to be "uniquifiable",
i.e., not only to provide an unambiguous chemical nomenclature,
but also be able to express a single, unique SMILES for every structure
in the same language.
This implies a fundamental requirement to express the symmetry of a
molecule correctly. Consider the problem of generating a unique
SMILES for orthofluorophenol, Oc1ccccc1F, but without aromatic bonds.
There are two ways to write it, OC1=CC=CC=C1F
(with the substituted carbons joined by a single bond)
(with the substituted carbons joined by a double bond).
These are two different molecular graphs:
the SMILES for these will always differ.
For purposes of unique nomeclature, it's not OK to have two
different "unique SMILES" for the same molecule.
SMILES language provides an "aromatic" concept to avoid this conundrum.
How does SMILES determine "aromaticity"?
Unfortunately it's not as trivial as "alternating single and double
bonds", but it's not rocket science, either.
The SMILES algorithm uses an extended version of Hueckel's rule
to identify aromatic molecules and ions.
To qualify as aromatic, all atoms in a ring must be sp2 hybridized
and the number of available "shared" p-electrons must satisfy Hueckel's
For example, an sp2 carbon shares one pi-electron,
so benzene (or cyclohexatriene) is aromatic (6 = 4(1) + 2).
Conversely, C1=CC=C1 cyclobutadiene and C1=CC=CC=CC=C1 cyclooctatetraene,
are (correctly) not aromatic, with 4 and 8 shared electrons, respectively.
Note that these are anti-aromatic compounds, i.e.,
FC1=CC=CC=CC=C1O and FC1=C(O)C=CC=CC=C1 are not the same structure.
The rules get a little hairy for heterocycles:
Oxygen and sulfur can share a pair of pi-electrons.
Nitrogen can also share a pair,
if three-connected as in methylpyrrole,
otherwise sp2 nitrogen shares just one electron (as in pyridine).
An exocyclic double bond to an electronegative atom "consumes" one
shared pi-electron, as in 2-pyridone or coumarin.
But that's about it.
Add up the electrons in rings (and ring systems, such as azulene);
if they meet the 4N+2 criterion, it's "aromatic".
Examples of aromatic compounds and their SMILES.
|6 = 4N + 2 shared pi electrons.
||All the same molecule, however you write it.
||"Normal" aromatic "n" nitrogen is pyridyl-N.
||Pyrrolyl-N is written [nH] and shares two pi-electrons.
|pyridine-N-oxide, neutral representation
||Exocyclic =O "consumes" one pi electron from a N that would otherwise
share 2 pi electrons.
|pyridine-N-oxide, charge-separated representation
||One electron is missing (+) from a N that would
otherwise share 2 pi electrons.
||Oxygen shares a pair of pi electrons, so furan is aromatic
||Sulfur shares a pair of pi electrons, so thiophene is aromatic
||The - charge is an extra electron, making 6.
||3 + 2 + 5 = 10 = 4N+2, so azulene is aromatic.
Tautomeric structures are explicitly
specified in SMILES. There are no "tautomeric bond",
"mobile hydrogen", nor "mobile charge" specifications.
Selection of one or all tautomeric structures is left to the user and
strongly depends on the application. Given one tautomeric form, most
chemical information systems will report data for all known tautomers
as needed. The role of SMILES is to specify exactly which tautomeric
form is requested, and for which there are data. A simple example,
with two possible tautomeric forms, is shown below:
Forward to "Related languages".
Back to "Reactions".
Return to table of contents.
Daylight Chemical Information Systems, Inc.