Molalign alignment algorithm

 C1CNc2ccccc2N1 Alignment starts by defining the substructure which each structure has in common. This is done by reading a SMARTS string supplied by the user. The first structure is read and the coordinates of the first three atoms matching the SMARTS are retained for later reference. Call these coordinates A1, B1 and C1. The structure is output unchanged. As each subsequent structure is read, the coordinates of the first two atoms (An and Bn) matching the SMARTS are used to compute the angle between the lines(bonds) A1B1 and AnBn. All atoms of structure #n are rotated around the z-axis by the negative of that angle. All atoms of structure #n are translated in x and y so that B1 superimposes with Bn. The coordinates C1 are compared to the rotated coordinates Cn. If necessary, the coordinates of all atoms of structure #n are rotated/flipped by 180 degrees around AnBn.
Notes and hints:

• If the SMARTS is a SMIRKS or reaction SMILES, the first product SMARTS or SMILES is taken as the SMARTS.
• If the structures to align came from a SMILES substructure or SMARTS search, that is the SMILES or SMARTS to use (duh!).
• If the structures are products of a reaction enumeration, the product core will be the common substructure. The SMARTS for this is in the reaction SMIRKS, or is readily obtained from a MDL reaction file using mol2smi.

The following other options are also available:

• The SMARTS can be computed by molalign from the first structure. Alternately, the SMARTS can be computed from the first product in an auxiliary MDL reaction file.
• The match atoms for rotation need not be #1 #2 and #3 in the SMARTS (or rdf or sdf file).
• Each structure can be further rotated around the z-axis so that the match atoms align with the x or y axis.
• The structures can be output as SMILES. This duplicates the function of mol2smi.
• The product read from the auxiliary MDL reaction file can be used as the reference structure and output as structure #1.
• Diagnostic output can be written to stderr.