A full SMILES language tutorial.
SMILES is a simple yet comprehensive chemical nomenclature.
The answer to the most commonly asked question about SMILES is: yes, it is an acronym, meaning Simplified Molecular Input Line Entry Specification. (SMILES originated in the depths of the US government, where humorous names for things are frowned upon unless they are acronyms.)
This document is intended to serve as a comprehensive tutorial for the SMILES language itself. This page is not intended to provide other functions, e.g., summaries of SMILES-compliant software, test suites, language reference standards, pointers to SMILES parsers, etc. However, pointers to pages providing such functions will be updated here.
The structural diagrams in this document are hyperlinks to an interactive SMILES depiction facility. To view a larger picture of a structure and view/edit its SMILES, click on its drawing.
The flip side is that SMILES is not useful for describing things that cannot be well-represented by valence model. SMILES is not suitable for representing many common substances, e.g., turpentine (distilled trees), Skelly-B or gasoline (a distilled fossils), beer, or milk. It isn't just that these substances are complex mixtures, but rather that a description of their properties is more useful than a description of their structure.
In practice, one chemist might represent nitromethane as C[N+](=O)[O-] with a nitrogen of valence 3 in a charge-separated structure while another might represent it as CN(=O)=O with a neutral 5-valent nitrogen. Which SMILES is correct? Both are. Is it ever possible to make an incorrect SMILES? Yes, for instance, the SMILES CN([O])[O] does not represent nitromethane (the electrons don't add up; it represents some wierd diradical).
The practical side of this is that the correctness of a given SMILES can't be determined by what happens when it is input to any particular program, even to the extent of asking, "Is this a valid SMILES?" For instance, nowhere in this document will you find specified limits such as the maxima for SMILES length, atoms per molecule, branch nesting depth, etc. -- they don't exist except in specific implementations. So it goes.
SMILES and depictions of simple molecules are shown in the following table. This table may be used as a graphical index by clicking on the links in the "section" column.
Depiction | SMILES | Name | Section |
---|---|---|---|
[H+] | proton | atoms
hydrogens |
|
C | methane | atoms | |
O | water | atoms | |
[OH3+] | hydronium cation | atoms | |
[2H]O[2H] | deuterium oxide heavy water |
atoms isotopes |
|
[Au] | elemental gold | atoms | |
CCO | ethanol | bonds | |
O=C=O | carbon dioxide | bonds | |
C#N | hydrogen cyanide | bonds | |
CC(=O)O | acetic acid | bonds branching |
|
C1CCCCC1 | cyclohexane | rings | |
C1CC2CCCCC2CC1 | decalin | rings | |
c1ccccc1 | benzene | aromaticity
rings |
|
[Na+].[O-]c1ccccc1 | sodium phenoxide | aromaticity
disconnects rings |
|
c1ccccc1[N+](=O)[O-] | nitrobenzene | aromaticity
disconnects rings valence model |
|
CC(=O)O.CCO>>CC(=O)OCC | esterification of acetic acid and ethanol to ethyl acetate | disconnects
components | |
CC(=[O:1])[OH:2] . CC[OH:3] > [H+] > CC(=[O:2])[O:3]CC . [OH2:1] | stoichiometric esterification with [H+] agent and atom-mapped O's | disconnects
components atom-mapping | |
C/C=C/C | trans-2-butene | DB/chirality | |
N[C@@H](C)C(=O)O | L-alanine | Th/chirality | |
O[C@H]1CCCC[C@H]1O | cis-resorcinol | Th/chirality |