Molalign alignment algorithm

C1CNc2ccccc2N1 Alignment starts by defining the substructure which each structure has in common. This is done by reading a SMARTS string supplied by the user.
The first structure is read and the coordinates of the first three atoms matching the SMARTS are retained for later reference. Call these coordinates A1, B1 and C1. The structure is output unchanged.
As each subsequent structure is read, the coordinates of the first two atoms (An and Bn) matching the SMARTS are used to compute the angle between the lines(bonds) A1B1 and AnBn.
All atoms of structure #n are rotated around the z-axis by the negative of that angle. All atoms of structure #n are translated in x and y so that B1 superimposes with Bn. The coordinates C1 are compared to the rotated coordinates Cn. If necessary, the coordinates of all atoms of structure #n are rotated/flipped by 180 degrees around AnBn.
Notes and hints:

  • If the SMARTS is a SMIRKS or reaction SMILES, the first product SMARTS or SMILES is taken as the SMARTS.
  • If the structures to align came from a SMILES substructure or SMARTS search, that is the SMILES or SMARTS to use (duh!).
  • If the structures are products of a reaction enumeration, the product core will be the common substructure. The SMARTS for this is in the reaction SMIRKS, or is readily obtained from a MDL reaction file using mol2smi.

    The following other options are also available: