Daylight v4.9
Release Date: 1 February 2008


mol2smi - converts a connection table-based file into a SMILES-based file.

Unix Synopsis

mol2smi [options] [infile [outfile]]


mol2smi(1) converts an MDL molfile or SDfile into a Daylight SMILES (SMI), isomeric SMILES (ISM), or Thor Data Tree (TDT) file. Alternatively, the output can be directed to two SQL loader (SQLLDR) files.

The input file must be a molfile or SDfile in v2000 format. R-group or S-group features are not recognized.

Default output is to stdout. In the case of SQLLDR output, the user must specify the rootname for the two output files.

Double bond stereochemistry and tetrahedral chirality are inferred from the atom coordinates and bond style information in the connection table and encoded as isomeric SMILES. SMILES, isomeric SMILES, and associated structural information are automatically stored in the TDT and SQLLDR outputs.

Data in SDfiles are converted for TDT and SQLLDR outputs. The SQLLDR format stores data in one file (.dat) and structural information in another (.str). Legal characters for data tags are limited to: $, _, /, A-Z, a-z, and 0-9.

Unless otherwise specified using the ID_FIELD option, the characters in the first line of the header block of each connection table are assumed to be a unique ID. In the SMI and ISM outputs, this ID follows the space-delimited SMILES or isomeric SMILES. The ID is stored in the $NAM field for the TDT format and as the first line of SQLLDR files. If first line of the header block is blank, the isomeric SMILES will be used as the ID for TDT and SQLLDR output.

An SDfile with no structural information produces a TDT rooted in the $NAM. Data from an SDfile with no structural information is captured in the SQLLDR rootname.dat file. However, no entry is written for SMI, ISM, or SQLLDR rootname.str output files.

The manual page for "convert" describes features common to this and the other "convert" programs. Please refer to it for more information on general usage and options such as -HELP, -VERSION, -SKIP_RECORDS, -DO_RECORDS, -ERROR_LEVEL, -ERROR_LOG, and -REJECT_LOG.




Controls whether the output is in SMILES, isomeric SMILES, TDT, or SQLLDR format. The default is SMI. For the TDT and SQLLDR formats, information on the first line of each input header block and any non-standard atom labels in the input file are stored as LINE1 and as an atom-tuple in the ASYM datatype, respectively. The original atom is designated by '*' in the SMILES. For TDT output, a special $SMIG datatype is written containing data about the conversion program name and version.


Adds 2D and/or 3D coordinates to the TDT or SQLLDR output. These data are taken from the actual coordinates in the input atom block and stored as a comma-separated list of values in unique SMILES order. Default is TRUE for both -ADD_2D and -ADD_3D. If non-zero coordinates are found in the atom block, then either 2D or 3D coordinates are written to the output file depending on which are available. Setting one of these values to FALSE eliminates the entry for that set of coordinates.
Splits data that is spread across multiple lines in an input file into separate entries for the TDT or SQLLDR output. The default is FALSE so that multiple lines are considered as a single value. Setting SPLIT_FIELDS to TRUE allows each line of a multi-line field to be considered as a separate value with the same data field identifier.
Replaces SMILES with isomeric SMILES in the output. Some programs such as rubicon require the SMILES datatype carry isomeric information. The default value for -SMI_IS_ISM is FALSE. Setting this option to be TRUE allows isomeric information to be stored in the SMILES data type.
-ID_FIELD <name>
Sets the data field identifier to be used as a unique ID. As described above, the default for ID name is the first line of each header block. If there is no ID on line 1, the isomeric SMILES is used. Alternatively, designating a data field identifier as the ID_FIELD causes the data in that field to be used as the ID. Note: One may need to place the data field identifier in quotes and use '\\' before '$'. Input records not containing information in the designated field are rejected.
Alters the way in which chirality is determined in order to detect implicit chiral centers. This is useful for some natural products. For a bond A-hash-B, the interpretation is that B is below A from the perspective of A and A is above B from the perspective of B. The default is FALSE. Setting -IMPLICIT_CHIRALITY to TRUE allows both ends of chiral bonds to be used in the determination of chiral centers when generating isomeric SMILES.
Toggles whether stereochemistry for ring double bonds is indicated. Default is FALSE. Setting this option as TRUE, marks the cis/trans stereochemistry for all ring double bonds when generating isomeric SMILES.
Indicates whether the values in the M ISO line of the property block are mass defects or actual masses for the isotopes listed. Default is FALSE. When -M__ISO_ARE_DEFECTS is set as TRUE, values in the line are treated as mass defects when generating isomeric SMILES.
Determines whether double bonds in the input file must have explicit hydrogens. The default is FALSE. Setting this option as TRUE requires that double bonds have all hydrogens explicitly indicated in order to generate isomeric SMILES.
Determines whether chiral atoms in the input file must have explicit hydrogens. The default is FALSE. Setting this option as TRUE requires that chiral atoms have all hydrogens explicitly indicated indicated in order to generate isomeric SMILES.
Converts radical rings to aromatic. The default is TRUE which allows for the certain types of five, six, and seven-membered radical rings to be converted to aromatic. Changing this option to FALSE, keeps the rings as specified in the input file. In order for a ring to be converted, all atoms in the ring must be carbon and designated as doublet radicals. In addition, no atom in the ring may have a charge.
-PTABLE <name>
Provides location of user-defined periodic table. Setting this option with a name of a user-defined PTABLE causes the uncommented lines present in the user PTABLE to used over the information in the default PTABLE. An example table is located in $DY_ROOT/data. Uncomment and edit a specific line of this file in order to change the set of valence/charge pairs to be used for this atom.

Return Value

mol2smi returns 0 to the environment if it succeeds without errors or a non-zero value if there are errors.



Daylight License

programs: convert

Related Topics

convert(1) smi2mol(1) rd2smi(1) smi2rd(1) sd2smarts(1) rd2smarts(1) rd2smirks(1) rubicon(1) licensing(5) options(5)