Daylight v4.90
Release Date: 13 Oct 2004


dt_mer_similarselect - initiate a similarity search

Generic Prototype

dt_mer_similarselect(dt_Handle, dt_Handle, dt_Integer, dt_Integer, dt_Integer,
dt_Integer, dt_String, dt_Real, dt_Real, dt_Real) => dt_Integer

C Prototype

#include "dt_merlin.h"

dt_Integer dt_mer_similarselect(dt_Handle hitlist, dt_Handle column, dt_Integer searchtype, dt_Integer action, dt_Integer find_next, dt_Integer * status, dt_Integer smilen, dt_String smiles, dt_Real limit, dt_Real alpha, dt_Real beta)

FORTRAN Prototype

include ''

integer*4 dt_f_mer_similarselect(hitlist, column, searchtype, action, find_next, status, smiles, limit, alpha, beta)

integer*4 hitlist
integer*4 column
integer*4 searchtype
integer*4 action
integer*4 find_next
integer*4 status
character*() smiles
real*4 limit
real*4 alpha
real*4 beta


Begins a similarity search task on the server.

The parameter 'hitlist' specifies where the resulting hits will be placed and, depending on the value of 'action', may specify the subset of the database to be searched. The parameter 'searchtype' specifies the type of string search to be performed. Valid values are:


Computes the tanimoto similarity. In this case, 'limit' is a lower limit above which a row is considered a hit. 'alpha' and 'beta' are ignored.
Computes the tanimoto similarity. In this case, 'limit' is a lower limit above which a row is considered a hit and 'alpha' and 'beta' are used as parameters for the tversky similarity.
Computes the euclidean similarity. In this case, 'limit' is an upper limit below which a row is considered a hit. 'alpha' and 'beta' are ignored.
The 'action' specifies how the results of the search are to be combined with the original hitlist, as follows:


The original hitlist is discarded. The entire pool is searched. All rows which meet the criteria are included in the resulting hitlist.
All rows not on the original list are searched and hits are added to the current list.
All rows not on the original list are searched and non-hits are added to the current list.
The original hitlist is searched and hits are removed from the hitlist.
The original hitlist is searched and non-hits are removed from the hitlist.
The original hitlist is searched and as soon as a hit is found, its hitlist index is returned. The hitlist is unchanged. Data in derived columns is modified, even though the hitlist is unchanged. The parameter find_next indicates where the position in the hitlist where the search is to begin. The first row examined is 'find_next' + 1.
Like DX_ACTION_NEXT_HIT, except finds the next row which does not match the search criteria.
The parameter 'find_next' specifies where in the hitlist the search is to begin when the action is either DX_ACTION_NEXT_HIT or DX_ACTION_NEXT_NONHIT. A value of -1 indicates that the search should begin at the beginning of the hitlist. To continue a search from a previously-found hit, specify that hit's index (the value returned by the previous call to the search).

The similarity search implicitly uses the fingerprint data in the database to perform the search. The column specified in the search is a derived-data column. The similarity of each row is computed and stored in the column and then compared with 'limit' to determine if the structure meets the search criteria.

The similarity is computed between the structure in each row and the 'smiles' string given in the search. Beginning with 4.9, the input 'smiles' argument may be an ASCII fingerprint string. Merlinserver checks the input string and if it detects a fingerprint it uses the fingerprint as the query directly. Note that the size of the fingerprint must be at least as large as the largest fingerprint in the pool being searched, otherwise an error is returned (merlinserver can fold the query down as needed but can't "unfold" one to match a larger database fingerprint).

At the start of each similarity search, the values of the given column are cleared. This means that even if the similarity search is aborted before completion, the data in the column are modified. The column is also cleared for each invocation of DX_ACTION_NEXT_HIT and DX_ACTION_NEXT_NONHIT. This means that only the cells of the column from find_next to the returned hit will have valid data, all others will be "not available".

Return Value

The status of the search task is returned in 'status' (see dt_continue(3) for descriptions). If the hitlist is short enough that time-slicing is not required, the value of 'status' will be DX_STATUS_DONE. Otherwise, the status will be DX_STATUS_IN_PROGRESS and dt_continue(3) will be required to finish the task.

The functions return value is either the progress on the task (see dt_done_when(3)) or -2 if an error is encountered.

Related Topics

dt_abort(3) dt_continue(3) dt_done_when(3) dt_mer_nsimilars(3)
Daylight Chemical Information Systems, Inc.