The aim of WATER is to allow the users to compute simple models which
may give hints on activity or inactivity of compounds based on prior knowledge
(the training set). A model is some kind of logic, which looks at a compound and
tells its judgement about the activity. As with most prophets you are supposed
to belief but it is not always the right thing to do. On the other side WATER
tries to explain how it got to the judgement.
These models may then be applied to compounds with unknown activity to predict
activity.
WATER tries to find commonalties between the active compounds in the training
set which are not present in the inactive compounds and vice versa.
This is very similar to what is usually done in project work. Let's have a look
at an example. Suppose you have a look at the last compounds you have
synthesized and sent to the screening and you recognize that all those which
have a tri flour methyl group beta to an carbonyl group are active while
those which do not are inactive. Then you will suppose that this is an essential
feature for activity in your compound class. This is exactly what WATER tries to
do in a systematic fashion.
There are tree parts of a model:
Within WATER anything which can be said to be true or false about a molecule
can possibly be used as descriptor. Here are a few examples of possible
descriptors:
Given a list of Descriptors some kind of statistics must be used to decide
which ones are relevant to the problem under consideration. This is done by
counting the occurrence of the presence of one descriptor in the active and
inactive compounds from the training set.
Assume we have a training set with 10 active and 100 inactive molecules and
that all actives have an amino group in beta position to a carbonyl group.
Now assume that none of the inactive molecules does have an amino group beta
to a carbonyl group. You might then come to the conclusion that this substructure
is very important for activity in your test system. Of course this could also be
due to a bad choice of molecules for your training set (more about this in the
caveats section).
As always live is not that easy and our training set will mostly not show a
single feature which is able to distinguish between actives and inactives.
In most cases we will have a number of features which are present more often
in actives than in inactives and a number of features where the relationship
is vice versa. Most features on the other hand will not be related to activity
and therefore will be more or less equally distributed between the actives and
inactives.
The weighting scheme should be able to associate a larger weight to fragments,
which are able to distinguish between actives and inactives. There are two
weighting scheme used within WATER, an infromation
theoretical weighting function and
an additive weighting function.
Given the percentage of active compounds having a feature pa and the percentage of inactive compounds having the feature pi the information theoretical weight is computed as:
wi = pa/100 * log(pa/pi) + (100-pa)/100 * log((100-pa)/(100-pi))
According to information theory (cf. Numerical Recipes in C chapter 14.) the higher this value the more information will the measurement of this feature yield.Given the percentage of active compounds having a feature pa and the percentage of inactive compounds having the feature pi the additive weight is computed as:
wa = log(pa/pi)
This is simply the logarithm of the fraction of actives versus inactives compounds having this fragment. According to information theory the logarithm is required to ensure the additivity of the weights it does however not alter the ordering of the features.Again, given our training set with 10 actives and 100 inactives look at the following table which illustrates the differences of the weighting scheme:
| Feature No. | Actives having this feature | Inactives having this feature | pa/pi | wa=log(pa/pi) | wi | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Absolute | % | Absolute | % | ||||||||
| 1 | 1 | 10 | 1 | 1 | 10 | 3.32 | 0.29 | ||||
| 2 | 5 | 50 | 1 | 1 | 50 | 5.64 | 2.67 | ||||
| 3 | 9 | 90 | 1 | 1 | 90 | 6.49 | 5.74 | ||||
| 4 | 1 | 10 | 9 | 9 | 1.11 | 0.15 | 0.01 | ||||
| 5 | 5 | 50 | 9 | 9 | 5.56 | 2.47 | 1.11 | ||||
| 6 | 9 | 90 | 9 | 9 | 10 | 3.32 | 2.89 | ||||
| 7 | 1 | 10 | 90 | 90 | 0.11 | -3.17 | 0.54 | ||||
| 8 | 5 | 50 | 90 | 90 | 0.56 | -0.84 | -0.07 | ||||
| 9 | 9 | 90 | 90 | 90 | 1 | 0 | 0 | ||||
The difference between the two weighting scheme is visible in the highlighted columns. The relative amount of compound having feature 1 and 6 are equal: 10% of the active and 1 % of the inactive compounds have feature number 1. Feature number 6 is present in 90% of the actives and in 9% of the inactives. This means that both features are present 10 times more frequently in the actives than they are in the inactives (pa/pi). Which yields an equally high value of 3.32 for both wa. The information theoretical wi on the other hand gives quite different values 0.3 for feature 1 and 2.9 for feature 6. The reason for this is that feature one is present in much less compounds, therefor if you expect to get less information by the measurement of feature 1 than you expect by the measurement of feature 6, simply because you will encounter feature 6 more often then feature 1.
The additive model is very simple but nevertheless effective. The additive weights of any fragment present in a molecule are added together to give a score for this compound. Suppose a compound has only feature 1 and 6 from the table given above, Then the scoring value would be 3.32 + 3.32 = 6.64. This scoring value may than be compared to the scoring of the molecules in the training set or in a validation set to estimate the probability that the compound is active.
The Information Theoretical Model was develeoped based on work presented by a group from CombiChem Inc. at the 1999 ACS Spring Meeting. The Information Theoretical model is a little more complicated since it has two parameters. A compound is rated active if it has Y bits from the most important X bits. The features are ordered by their wi values in decreasing order. All possible combinations of X and Y are computed and the number of actives which are rated active by the model as well as the number of inactives which are rated active by the model are computed. Clearly the best case would be to have all actives (100%) and none of the inactives (0%) rated active, unfortunately the model usually is not perfect. As measure for the quality of the Information Theoretical model the difference between % actives and % inactives rated active is taken:
q = % actives rated active - % inactives rated active
Clearly the larger q the better your model will distinguish actives from inactives. There are also two other quality parameters, which can be used to judge a Information Theoretical model.s = % actives rated active / % inactives rated active
The selectivity however may not be used alone because it will have a maximum whenever there are no inactives which are rated active independently of the number of actives.In any case let me know about your thoughts since this is still experimental.
Page Owner: Alberto Gobbi - Last updated: Apr 17, 1999