Precursor Selection for Combinatorial Libraries

Robert D. Brown, Mark G. Bures, Elizabeth A. Danaher, Jerry M. DeLazzer, Isabella M. Lico, Meixiao Liu, Yvonne C. Martin and Patricia A. Pavlik

Pharmaceutical Products Division, Abbott Laboratories


Diversity

A precursor set is chosen to maximise the diversity of coverage of 3D pharmacophore space. This space is defined in terms of the 3D geometric distribution of points of potential hydrogen bonding, charge interaction and hydrophobicity. A description of a monomer in these terms represents the contribution that monomer could make to binding to a receptor. Since the sets are intended to be re-useable in many libraries, we cannot measure the diversity of any resulting library. Instead, this measure seeks to maximize the pharmacophore space explored in a fixed position and direction from a core, or from a second monomer in a polymeric type library.

For these two structures, the potential pharmacophore points are labelled. Note that the substructure shown in blue has been added to the two primary alcohols to provide a fixed "core"

In 3D, considering only the pharmacophore points and their geometric relationships, the two structures appear as shown below.

The two structures present different patterns of pharmacophore points relative to the "core" (represented by the two aligned green point-of-attachment markers) and so are considered to span different areas of space.


Precursor Selection Procedure


Scope

The scope of a precursor set is defined by the type(s) of chemistry to be done on the precursor and by what is commercially available. Compounds with functional groups which are more reactive than the target group must be excluded. Groups which would make any product compounds uninteresting as leads for further medicinal chemistry are also excluded. Two more straight forward scope definitions are as follows:

1) All commercially available primary alcohols with the exception of those structures with:

2) All commercially available carboxylic acids with the exception of those structures with

These scope definitions are translated directly into SMARTS definitions for searching the Available Chemicals Directory using MERLIN.


Clipping Precursors and Attaching Core

Whether the precursor set is chosen for a specific library or is to be reused many times it is necessary to make changes so that it has the structure that it will have in any products. In particular it must have the correct

If the set is re-useable, a "pseudocore" is added. This must be

For example:

1) Primary alcohols

2) Hydrazines

Once attached, the pseudocore provides a fixed reference point from which to compare the pharmacophore space covered by each precursor in the set. The transformation of each precursor is carried out using a SMARTS toolkit program, EditSmiles.


EditSmiles

EditSmiles is a Smiles and Smarts toolkit program which makes structural changes to a set of molecules. In this context it is used to clip precursors and to add the pseudocore. It is also useful to produce clipped smiles for addition to monomer databases. By iterative addition of various R-groups it may also be used to enumerate combinatorial libraries. EditSmiles either makes changes to a group of atoms or to a bond and defines a rule language to specify these.

An input definition file of rules is provided to the program. The rules are of the following form:

  1. Change_Group has the following arguments

  2. Change_Bond has the following arguments

The transformations above would be encoded as

Chg_group [O;H][C;H2]	[H]	c1ccc([SiH3])cc1 -multi first 

Chg_group    [N;H](C)NC [H] C[SiH3] -multi first

Chg_bond +1 C([SiH3])N [H] NNC[SiH3] [H]

In the second case two rules are needed, the first adds the pseudocore and the second closes the ring to the second nitrogen.


EditSmiles Program Details

Rules are interpreted into an array of structures containing mol and vector binding objects, together with flag settings.

The following pseudo code is for each input smiles

create mol object (dt_smilin) for each rule put into modification state (dt_mod_on) if change_group rule check target present (dt_vmatch -> pathset1) if target present get leaving atoms(dt_umatch -> pathset2) delete atoms in pathset2 copy atoms and bonds from added_smiles* create bond between head of added_smiles and atom in pathset1* check and fix chirality * if change_bond_rule check targets present(dt_vmatch on both ends) if targets both present if adding bond remove leaving groups (as above) increment bond order/create bond if deleting bond decrement bond order/delete bond add replacement groups as above put molecule into read only state (dt_mod_off) write smiles (dt_cansmi)

* note: this functionality comes from combine & du_eliminate in the contrib directory


Potential Pharmacophore Points

Potential pharmacophore points are identified and their coordinates calculated using 3D-Features, a expert-system program making use of the Smiles and Smarts toolkits as well as CONCORD.

3 types of pharmacophore point are currently identified by the program.

In precursor selection a special "point-of-attachment" type is defined so that all pharmacophore patterns can be oriented relative to that point or set of points. Site points are not currently used in precursor selection. Pharmacophore point assignments for two structures are shown on the first page.

3D-Features - Rule Base

The program is an expert system containing a rule base written in terms of around 300 Smarts targets. These define potential behviours for atoms based on their 2D environments - donor, acceptor, etc. Any atom can hit more than one target. E.g. an atom can potentially be a donor or an acceptor. For example, for the various forms of enolate.

GroupSMARTS
AZIDE[N]=[N]=N
EWG[Br,Cl,F,I,$AZIDE]
ONEENOLATO=C([C,c])[C;!H0]([$EWG])C([C,c])=O
TWOENOLAT[O;D1]C([C,c])=C([$EWG])C([C,c])=O
THREEENOLATO=C([C,c])C([$EWG])=C([C,c])[O;D1]
ENOLAT[$ONEENOLAT,$TWOENOLAT,$THREEENOLAT]
ONEG[$ENOLAT,$ACXM,$ACIDO,$DBOSO,$OM,$OXAM;!$ODNEG]
NEG[$ONEG,$SNEG,$TRZLN,$Arsfnmd]

Tautomers and Ionizable Groups

3D-Features encodes behaviour on the basis of all major tautomers and all likely ionization states, irrespective of the tautomer or state of the input smiles.

For example, the rule base has rules defining the nitrogen in these two environments to be equivalent for the purposes of charge interaction

One set of rules will specify that the following three oxygen environments are equivalent and are acceptors only. Other rules will specify that the three environments of each nitrogen are equivalent and are potentially both donor and acceptor.

3D-Features - Program

The program flow is shown in the diagram on the following page. The Smarts targets are loaded and stored as vector bindings. Only the final target bindings $HBD, $HBA, $NEG etc are carried forward to the remainder of the program. Each input smiles is then searched against each target in turn; the results are recorded in a separate pathset for each. The searches are

Co-ordinates are obtained by a call to CONCORD using the link library (or directly from a mol2). Geometric centroids can then be calculated for each of the group targets and site points located. Coordinates of the points and groups are output for the next stage of the precursor selection.


Grouping by 3D Search

Structures are first grouped so that all group members have the same set of features, e.g. 2 donors, 1 acceptor, 1 negative + point of attachment. These groups are then subdivided by using each structure as the query to a whole structure 3D search, allowing a tolerance on the match of corresponding distances. A typical tolerance is +/- 0.5Å.

For example the structures shown in the diagram on the following page, taken from the primary alcohol set, all have only one hydrogen bond acceptor in addition to the point of attachment (the pseudocore is shown in green).

Superimposed, the structures are as follows:


Post-processing Families

Non-overlapping Families

Each structure heads its own family, and some structures appear in many families. This provides too much information to allow the chemists to make sensible selections from the families. The following simple, and non-optimal, procedure reduces the families so that no structure appears more than once.

  1. Sort families by size, largest first

  2. For each structure in the top family, remove all occurrences in later
  3. families

  4. Write out top family and remove it from the list

  5. If other families remain go to 1.

Supplier and Price Information

Supplier, quantity and cost information are obtained for each precursor from ACD ORACLE tables. The best price per gram/ml for the minimum required quantity is recorded, together with the relevant supplier, for display when the structures are being selected.

Remove Core/Pseudocore

EditSmiles is run in reverse to remove the (pseudo)core. The original smiles cannot be used as this does not have the feature points marked.

Browsing and Selection

A slightly modified version of the cluster viewer from the Daylight contrib directory is used to browse and select the structures. Price and supplier information is displayed and can be taken into consideration when making selections.


Inventory and Ordering

Inventory and ordering of precursors is handled using an ISIS application. Structures are stored in MACCS; lot, container and dispersal information in ORACLE. Ordering will be automated since the existing stockroom system is based on the ORACLE ACD tables.

The screenshots show the summary/searching screen and the structure/lot/container registration screens respectively.

Robert Brown,(brownr@abbott.com), Abbott Laboratories, Feb 1996.