Daylight v4.9
Release Date: 1 February 2008


smi2mol - converts a smiles-based structure file into a connection table-based file.

Unix Synopsis

smi2mol [options] [infile [outfile]]


smi2mol(1) converts a Daylight SMILES (SMI) or Thor Data Tree (TDT) file into an MDL formatted file containing molecules (molfile or SDfile).

The input file must be either a SMI or TDT file. If a unique ID is included in the SMILES file, the SMILES must be followed by ' ' (space) and then the ID. If the structure is absent, the ID in the SMILES file must be preceded by a space. SMILES may not contain reactions but may contain sterochemistry/isotopes. If an input TDT does not contain a structure it must be rooted in a name identifier. If it is multi-branched it must be rooted in $SMI and will be split into multiple records based upon $NAM unless otherwise specified using the NAME_DATATAG option. The SMI_WITH_TUPLES option is available to set whether output coordinates use TDT structural information associated with $SMI or with ISM.

Default output is to stdout in SDfile (v2000) format. If the input TDT contains the LINE1 data tag then, the information associated with datatype will be placed on the first line of the header block for each connection table.

Stereochemistry is inferred from the isomeric SMILES and appropriate MDL bond styles are set to reflect this. While these follow the MDL rules, some users may have conventions which allow a less rigorous depiction of chirality. Note: The chirality flag is only set in the output, if it is set in the input TDT.

In order to be visualized, SDfiles require coordinates for the structures. If the input file does not contain this information, 2D coordinates and/or bond styles are generated via the standard Daylight depict algorithm. All atoms are assumed to be visible (present in the output file) except normal hydrogen atoms. This can be over-ridden by setting the VIS datatype in the input TDT.

The data in an input TDT file are transferred to the SDfile using the same data tags. Note that the special meaning of the '$' in distinguishing identifiers is lost as the SDfile does not have the same underlying tree structure as the TDT.

Since MDL allows non-standard atoms in the CTfile format, wildcards such as '*' are replaced by the corresponding string from the ASYM datatype, if available.

The manual page for "convert" describes features common to this and the other "convert" programs. Please refer to it for more information on general usage and options such as -HELP, -VERSION, -SKIP_RECORDS, -DO_RECORDS, -ERROR_LEVEL, -ERROR_LOG, and -REJECT_LOG.


Controls whether the input file is in SMILES or TDT format. Default is SMI.
Designates the data tag to be used as the unique ID. For SMI input, the space-delimited ID after the SMILES is used. For TDT input, the LINE1 value, if available, is used. Otherwise the default tag is $NAM. Specifying another tag in the absence of LINE1, places the data associated with that tag on the first line of the header block of the connection table. In addition, multi-branched TDTs are split using the default or designated data tag. Note: One may need to place the tag name in quotes on the command line and use '\\' before a '$'if it is an identifier in the TDT.
Determines whether the SMILES is placed in the comment line. Default is FALSE. Designating -SMI_COMMENT as TRUE writes the SMILES to the comment line (line 3) of the header block in each connection table. Note: The comment line is limited to 80 characters.
Splits multi-field TDT data into separate output entries. The default is FALSE. Setting SPLIT_FIELDS to TRUE allows each multi-line field type to be considered as a separate entry with the same data tag.
Designates whether 3D coordinates are included in the output. Default is FALSE. If -USE_3D is set to TRUE and the input TDT file contains 3D coordinates, then 3D coordinates are included in the output file.
Determines whether output tuple information is associated with SMILES or isomeric SMILES. Default is TRUE so that tuples associated with $SMI (2D or $D3D) are saved in the output file. Setting this option to FALSE outputs the tuple information associated with the ISM (2DI or 3DI).

Return Value

smi2mol returns 0 to the environment if it succeeds without errors or a non-zero value if there are errors.



Daylight License

programs: convert

Related Topics

convert(1) mol2smi(1) rd2smi(1) smi2rd(1) sd2smarts(1) rd2smarts(1) rd2smirks(1) licensing(5) options(5)