Daylight Version 4.9
Release Date 08/01/11
This document, the programs rubicon and autorules are copyrighted 1992-2011 by Daylight Chemical Information Systems, Inc. of Laguna Niguel, CA. Daylight explicitly grants permission to reproduce this document under the condition that it is reproduced in its entirety, including this notice. All other rights are reserved.
Rubicon Reference Manual
TABLE OF CONTENTS
1. Introduction "Rubicon" stands for "Rule-Based Invention of Conformations". Rubicon is a distance-geometry method which produces 3-D conformations given chemical structures with only connectivity specified. Distance-geometry methods randomly sample conformations and are particularly powerful for problems dealing with molecular matching and flexibility. The method has three basic parts: establishing geometric constraints in distance-space, sampling a conformation in that space, and embedding it in 3-dimensions while minimizing bounds violations.
One of the fundamental difficulties with most distance geometry programs is that they have a very naive view of chemical geometry. Also, the chemical intelligence in typical distance-geometry programs is hard coded into the program and is very difficult to improve upon. Rubicon addresses this problem by employing a "soft", or rule-based method for establishing geometric constraints which is based on a powerful language for describing chemical patterns (SMARTS).
Rubicon's chemical intelligence is embodied by its rule set, which is specified at run time. Various rule sets can be devised for solving a wide variety of 3-dimensional chemical problems. The simplest approach is probably to employ a naive rule set. With appropriate rule sets, Rubicon will exactly mimic most existing distance geometry algorithms. At the other extreme, very sophisticated rule sets can be devised to predict molecular geometries. In this case, Rubicon behaves like a non-distance-geometry rule-based model builders, with the advantage that unspecified geometries are sampled (and also that additional knowledge may be added as needed without rebuilding the program).
The autorules program (supplied with Rubicon) automatically derives a set of constraint rules from a given set of training conformations. Using autorules-generated rule sets, Rubicon will generate structures that have geometries similar to those found in the the training set, e.g. crystal structures, docked structures, computed low energy structures, your favorite structures, etc.
Using distance geometry to sample from all energetically reasonable conformations is a powerful idea, but it's not the only useful one. One of the most powerful uses of distance geometry is to sample in a very biased way, e.g. to only sample conformations of interest which match a desired pharmacophore or fit an enzyme's binding site.
To allow maximum flexibility, Rubicon is supplied in two forms: a ready-to-run program (rubicon) and a programming library (libdc_rube.a). The rubicon program is very robust and easy to use, but is limited to sampling conformations based on information from a rule set (e.g. only constraint errors are minimized). The libdc_rube.a programming library provides the Rubicon algorithm as a tool to be used within other programs. This allows sampling conformations based on other criteria, such as docking.
rubicon [options] < input.tdt > output.tdt
By default, Rubicon produces one conformation ($D3D data item) per SMILES-rooted datatree on the input. If you have SMILES in .smi format, you will need to edit it a bit, e.g. the line "CCO ethanol" needs to be changed to: "$SMI<CCO>PCN<ethanol>|" or just "$SMI<CCO>|" if the name isn't important. The shell script $DY_ROOT/bin/smi2tdt, is provided to make this transformation.
a summary of options can be printed with the "-options" option:
$ rubicon -optionsCommonly used options can be set in your environment (prefix option name with "DY_" or in your profile (typically $HOME/dy_profile.opt). For a complete discussion of options and environment variables, see the Daylight Systems Administration Manual.
To see the current option settings (including the effect of any command line arguments) enter
which will produce something like:
$ rubicon -settings
Specify the desired Rubicon rule file with -RUBE_RULES.
RUBE_TRIALS (default 1)
Rubicon attempts to produce RUBE_NCONFS acceptable conformations by trying RUBE_TRIALS random samplings. Those which meet the acceptance criteria (see RUBE_ACCEPT_* options) are passed through the conformation filter (see RUBE_FILTER_* options) for output. The default values cause Rubicon to try just once.
RUBE_ACCEPT_GRMS (default 0.01)
Consider conformations to be acceptable if they converge to gradient root-mean-square RUBE_ACCEPT_GRMS, maximum distance violation RUBE_ACCEPT_MXDV, and maximum volume violation RUBE_ACCEPT_MXVV. The default values are somewhat generous for very simple structures and somehat strict for peptides.
RUBE_FILTER_SLOP (default 0 Ångstroms)
Conformations are passed through a filter which suppresses output of identical conformations by comparing their distance matricies. Conformations which have an isomorph with all interatomic distances within RUBE_FILTER_SLOP Ångstroms are considered to be identical. Isomorphs (atom-atom matchings) are determined by RUBE_FILTER_SMARTS - the flag USMILES (the default value) uses the unique SMILES (i.e., all heavy atoms). This can be set to any valid SMARTS, e.g. "a" will compare only distances between aromatic atoms and "!#6!#1" will compare only heteroatoms. If RUBE_FILTER_SMARTS is TRUE, signed chiral volumes are also compared so enantiomers will be considered non-identical. This filter is disabled by default (RUBE_FILTER_SLOP is 0), so all acceptable conformations are output.
RUBE_HYDROGENS (default ALL)
NONE is much faster
RUBE_OUTPUT_FORMAT (default TDT)
The only alternative to TDT is the venerable PDB format, which lots of programs read and write in different flavors. Rubicon's PDB output is pretty simple, and intended for output of small molecules:
REMARK Several lines of them (contain SMILES, name, source, errors)
ATOM Atom names are upper case atomic symbols (e.g. CA is calcium, you have a problem with that?) followed by ordinal per-element count up to "99" then "**". Residue names are all "RES"; residue numbers all 1.
TER One TER record output between ATOM and CONECT records
CONECT One per atom, bonds listed both ways, double bonds are double on the line, triple bonds three times, "to" atoms are not sorted on the line.
END Separates conformations
If this flavor of PDB is not suitable for your purposes, consider writing TDT output and converting it to PDB format with the program tdt2pdb (a a program with contributed source code which can be modified as desired).
RUBE_SEED (default 281191802)
Rubicon uses a pseudo-random number generator which provides identical behavior on all platforms (RANMAR, Marsaglia and Zaman, 1987). RUBE_SEED sets the seed for this number generator, which must be in the range 0 to 900000000. Given otherwise identical input, Rubicon should produce identical results for a given seed on all supported platforms.
NOTE: To induce Rubicon to produce pseudorandom trials at each invocation, specify a different value for RUBE_SEED each time it is called, e.g. seconds since midnight, i.e.:
rubicon -RUBE_SEED `date +%H%M%S`
RUBE_RUNID (default NONE)
A "runid" is a string which is appended to the output source field. This is particularly useful if you are making multiple Rubicon runs on a given structure with varying parameters and loading them in a single database.
RUBE_ACCURACY (default 1e-20)
This value works well on all machines for which v4.3x software is distributed. This option is insurance against the possibility that exotic, yet compatible, computer hardware might appear.
RUBE_BUMP14 (default is TRUE)
If set TRUE, van der Waals interactions are applied to acyclic 1-4 distances, restricting minimum torsions, which is usually a good thing to do.
RUBE_LIMITEVAL (default is 1000)
Rubicon's minimizer typically converges with 50-300 function evaluations when things are going well. If you are working with really tough structures which fail to converge a lot (either the structures or the rule file would need to be pretty weird), you might try to gain speed by running a larger number of trials with a lower evaluation limit and just save the ones that converge.
Mainly for debugging, this option controls output of non-essential text to standard error. The amount of output generated at the "VERBOSE" level is truly staggering (all N-square bounds matrices are dumped multiple times).
RUBE_WRITE_BOUNDS (default FALSE)
When set TRUE, the complete smoothed distance bounds matrix is added to the output as a DBM data item and the list of chiral and unsigned volumes as a VCR data item. This option is available only with TDT output format. The bounds matrix is N-squared, so use it sparingly on large molecules, particularly if hydrogens are included. If you would like to use Rubicon's front-end with your own distance geometry package (i.e. output the DBM only), try to organize it as a filter.
The DBM ("Distance Bounds Matrix") datatype consists of two fields, the bounds matrix and the name of the data source (i.e. "Rubicon 4.34"). The bounds matrix field contains the full bounds matrix after smoothing, as comma-delimited numbers, with atoms indexed in SMILES order, and with lower bounds in the lower triangle.
The VCR (Volume Constraint Rules) datatype also consists of two fields, the volume constraints which apply to the molecule and the name of the data source (i.e. "Rubicon 4.34"). The volume constraint field contains six comma-delimited numbers for each constraint: the index of four atoms defining the volume followed by the upper and lower volume bounds (in cubic angstroms). Chiral (signed) volume constraints are indicated by signed bounds (e.g., "+0.0,+5.0"); unsigned volume constraints are indicated by unsigned bounds (e.g. "0.0,5.0"). No VCR data item is output if no volume constraints apply.
RULENAME smarts (atomlist) bounds
For instance, the following rule sets the bond length bounds for bonds between all aromatic carbon and aromatic oxygen atoms to the range (1.324 - 1.402) angstroms, inclusive:
DISTANCE c:o (1,2) 1.324 1.402
Rubicon rule sets do not use conditional logic in any way - a rule is asserted to be true in every environment which it matches. This allows knowledge (rule sets) to be combined from various sources without needing to express them within a potentially restrictive logical framework; it also implies that results are not affected by the order in which they are applied. (For example, the rules illustrated here were taken from publications based on CSDB.) Rules may be expressed at various degrees of generality, e.g. the above rule is more general (applies to all aromatic carbon-oxygen bonds) than the first of the following rules, which apply to furan substructures:
DISTANCE o1cccc1 (1,2) 1.338 1.398
The most restrictive bound that matches applies. Upper and lower bounds are applied independently. Note that the atom list (3rd field) indicates to which SMARTS atoms the bounds apply, e.g., in the last rule above, the C3-C4 furan carbon bond length bounds are (1.392-1.456), which differs entirely from the range of the furan C2-C3 bond length.
COMMENTS. Exclamation point (!) is used for comments: `!' and all characters following on the line are ignored.
WHITESPACE. Tokens ("words") in a Rubicon rule file are delimited by one or more whitespace characters (space, tab, newline).
COMMANDS. Command names are case-insensitive reserved words which must begin at the start of a line. They are:
PRAGMA. The PRAGMA command is used to provide information to the Rubicon processor which does not affect the interpretation of the constraint rules. (Rubicon processors are free to ignore PRAGMAs; they're like hints.) PRAGMA commands consist of the command PRAGMA followed by a name and a value. Two PRAGMAs are used by the 4.3 Rubicon processor: HYDROGENS (values are HCOMPLETE and HSUPPRESSED) and AUTORULES (values indicate which rule classes are automatically generated). For instance, consider the line:
PRAGMA HYDROGENS HCOMPLETE
This tells the Rubicon that this rule set is intended to be used with hydrogen-complete molecules (H-complete and H-suppressed van der Waals radii differ). Rubicon v4.3 will generate a warning if this rule file is used on a hydrogen-suppressed molecule.
DEFINE. The DEFINE command creates a SMARTS vector definition which can be used as a primitive in SMARTS later in the file. The format is:
DEFINE name smarts
Use of SMARTS vector definitions can greatly improve the readability and of a Rubicon rule file and make it easier to maintain. This is illustrated by this excerpt:
DEFINE $Namide [NX3]C=* ! amide and other semi-conjugated
That's it! Everything else in a Rubicon rule file is a constraint rule.
The bounds specification in a single constraint rule is a single pair of numbers representing the upper and lower bounds:
RULENAME smarts (atomlist) min max
Some examples follow:
RADIUS [#1] (1) 0.90 0.90 ! proton VDW
BOUNDS smarts (atomlist)
This example specifies an aromatic iodo-substituent environment:
BOUNDS [ID1]-!@[cD3]:[cD3] (1,2,3)
BOUNDS rules are especially good for specifying long-range constraints in large patterns, such as steroids and other ring systems (such rules are not typically generated by hand, however).
The key to building hashable rules is to specify environments which are both unambiguous and geometrically meaningful. For this purpose, Rubicon uses the following attributes to characterize an atom: atomic number, aromaticity, total connectivity (X), valence (v), and the size of the smallest ring that it is a member of. Subgraphs that are exhaustively generated include: bonded pairs, triples, and quads, 3- and 4-way branches, and all SSSR rings.
Special rule types are used to store such data: PAIRBOUNDS, TRIPLEBOUNDS, QUADBOUNDS, BRANCHBOUNDS, and RINGBOUNDS rules differ from normal BOUNDS rules only in how they are accessed. Instead of looping though all rules looking for one that matches the molecule of interest (as must be done with general BOUNDS rules), rule names are generated from the molecule and looked up directly. For instance, molecules which have the O=n(c)c pydidine-N-oxide moiety would generate and match the automatically generated following rule:
BRANCHBOUNDS [nX3v5r6](=[OX1v2r0])(:[cX3v4r6]):[cX3v4r6] (1,2,3,4)
The net effect of this is pleasantly surprising: operating with a set of specific rules which is large enough to be reasonably comprehensive (50,000+), the time to find all rules which apply to drug-size molecules is about 1 sec.
autorules [options] < input.tdt > output.rules
There are only two options:
-u produce (unsigned) volume rules. Unsigned volumes of branches and quads provide additional constraints, but add considerably to the size of the rule file (and much of the information is redundant). The default is not to produce unsigned volume rules.
-v (verbose) add comments to the output file showing SMILES of structures resulting in extremal limits. This option increases the size of the rule file and is useful if you are going to examine it manually. Default is not to produce extremal SMILES as comments.
To run autorules, you will need a TDT file containing desired training conformations as $D3D data items. Autorules uses every conformation in the input file, so limit input to the conformations that you want to train on. Remember: garbage in, garbage out! Trees with isomeric SMILES produce isomeric rules. Autorules has a built-in "slop" parameter of 0.0005 Å, e.g., bounds within 0.0005 Å are considered identical.
Autorules writes progress to standard error, indicating how many rules of each type have been created. An example run follows:
autorules -v < /home2/tdt/demo.tdt > demo.rules
Unsigned volume rules .... OFF
Rules produced by autorules are in the form described in the previous section, "Automatic bounds rules". A rule produced by the above run:
The comment trailing the rule is typical of those produced by the -v option.
Other rule files can be appended to those produced by autorules to produce a combined rule file for Rubicon. Autorules-generated files should not be concatenated to each other.
The following code fragments suffice to access Rubicon using the default Rubicon method using the default rule file:
Rubicon is very usable at this simple level. Most other functions in the Rubicon library are provided for the purpose of customizing the method. If you just want to sample conformations of a molecule, you probably don't need to digest all the gruesome detail that follows. For such purposes, skip down to the sections marked "Rubicon method control attributes" and "Rubicon access".
So, for pragmatic reasons only, the 4.3 Rubicon library is supplied with a C-language interface. All function names start with "dc_" in this interface (rather than the usual "dt_"). One header file is required, dc_rubicon.h. Macro names start "DC_". This interface is called the "dc_" interface.
The dc_ interface resembles an object-oriented interface more than a typical C-language interface. Programming with the dc_ interface is almost identical to programming with the dt_ interface, except that handy polymorphic functions aren't available for Rubicon constructs, such as methods. All structs and functions are passed as (void *) arguments or results. Like objects ("handles"), you can't dereference them (they're opaque). The Rubicon library relies on the Daylight Toolkit for almost everything else, e.g., it uses the normal molecule and conformation objects.
Floating-point representation is another special issue for Rubicon. The normal kind of floating point number used by the Daylight Toolkit is defined as dt_Real (float for most compilers). The equivalent definition in the dc_ interface is DC_REAL. (In release 4.3x, it's the same as dt_Real.) In general, it seems satisfactory to pass real numbers back and forth as 32-bit things. Given the popularity of 64-bit computers (and the prospect of 128-bit hardware), it seems prudent to leave the door open for interfaces based on larger representations of floating point numbers. Hence, DC_REAL.
Although the dc_ interface is C-specific, it also may be accessed from Fortran programs by use of a wrapper functions (typically written in C).
class name of method class, e.g. "minimizer"
name name of method, e.g. "Rubicon 4.31"
version integral version number, e.g. 431
clientdata pointer to user data (void *)
next pointer to next method (for linked lists of methods)
These attributes are stored in a "method header". No public functions are provided to create or destroy method headers - these are reserved for functions creating methods, such as dc_cg_alloc_minimizer(). Method attributes should be accessed using the following public functions:
int dc_method_set_name ( void *method, int len, char *name );
Each of these functions sets one attribute of the given method and returns TRUE on success or FALSE on error (e.g. invalid method). The clientdata attribute is intended for use by any function which needs to associate data with a method. The next attribute can be used to form linked lists of methods.
No public function is provided to set the class attribute, since this is only done by higher level functions which create methods. For instance, methods returned by dc_cg_alloc_minimizer() will have the method class attribute permanently set to "minimizer".
char *dc_method_class (void *method, int *len);
Each of these functions returns one attribute of the given method. On error, they return 0 or NULL as appropriate. These functions are intended for use at any level, although some are intended for special purposes. For instance, a function expecting a minimizer method can check a method's class attribute to verify that the method is in fact a "minimizer".
void dc_method_print_header(void *method, int len, char *pre, FILE *fp)
Print header attributes for given method on stream fp. Each line is preceded by len chars of prefix pre. This function works for any method (used mainly for debugging). Output includes the "id" attribute (an invisible magic number).
The following high-level functions operate on CG minimizers per se:
Allocate a conjugate-gradient minimizer with default parameters.
int dc_cg_dealloc_minimizer(void *cgmeth)
Deallocate minimizer obtained from dc_cg_alloc_minimizer(). Returns TRUE on success or FALSE on error (i.e., if argument cgmeth is not a conjugate-gradient minimizer method).
void *dc_cg_next_minimizer(void *cgmeth)
Return next minimizer in linked list of minimizers (or NULL).
int dc_cg_print_minimizer(void *cgmeth, int len, char *pre, FILE *fp);
Print all attributes of given conjugate-gradient minimizer method cgmeth on stream fp. Each line is preceded by len chars of prefix pre. Output includes method header attributes. Returns TRUE on success or FALSE on error (i.e., if argument cgmeth is not a conjugate-gradient minimizer method).
typedef DC_REAL (*DY_CG_OBJFUNC)(void *cgm, int n, DC_REAL *x, DC_REAL *g);
i.e., a function returning a DC_REAL value with 4 arguments: cgm, pointer to the method; n, the number of parameters (input); x, an n-by-4 array of DC_REAL variables (input); and g, an n-by-4 array of gradients (output). DC_REAL is defined in dc_rubicon.h and is implementation-dependent. (For all architectures supported in v4.3x, it's defined as float.)
CG minimizers have two objective function attributes: objfunc, the "real" objective function; and usrfunc, an additional function. In the default Rubicon method, objfunc is the standard 4-D error function used for minimizing distance-geometry bounds and usrfunc is NULL. A user-supplied function can be specified to minimize an additional function, e.g. for matching a pharmacophore or docking to a binding site. If a usrfunc is specified, the results (error and gradients) are added to those of the standard function to produce the combined function to be minimized.
Four functions are provided to set and get objective functions:
int dc_cg_set_objfunc(void *cgmeth, void *objfunc);
Set the primary and additional objective functions of cgmeth. cgmeth must be obtained from dc_cg_alloc_minimizer(). objfunc and usrfunc must be objective functions in the correct form. Returns TRUE on success, FALSE on error.
CAUTION: For use with Rubicon, it is strongly recommended that modifications to the objective function be done via a usrfunc. The default objfunc has been refined by many people over many years -- it is unlikely to be improved by casual changes.
NOTE: If you want to turn off the standard objective function but still use a usrfunc, do so by setting scale_dist and scale_vol attributes to 0.0, rather than setting objfunc to NULL. (Results will be identical, but Rubicon recognizes these values as a special case which allows slightly faster operation.)
void *dc_cg_objfunc(void *cgmeth);
Return the objective functions of cgmeth (or NULL on error).
A additional function provides information for the convenience of usrfunc's:
dt_Handle dc_cg_conformation(void *cgmeth, int nv, DC_REAL *xyzw);
Return the xyzw conformation as an object.
NOTE: The molecule that Rubicon is working with is the dt_base() of the conformation object returned by this function. This is not the same molecule as submitted to dc_rube_conformations(). (Rubicon works on a copy of the molecule.)
typedef int (*DY_CG_REPORTFUNC)(int iter, int neval,
where the function returns TRUE on success (if FALSE, the minimization is aborted) and is supplied with the number of the iteration, the number of function evaluations so far, the minimum function value so far, and the current total gradient-squared.
Two functions are provided to set and get report functions:
int dc_cg_set_stepreport(void *cgmeth, void *reportfunc);
Set the step report function for cgmeth to be reportfunc. reportfunc will be called after each minimizer iteration. cgmeth must be obtained from dc_cg_alloc_minimizer(). reportfunc must be a report function in the correct form.
Returns TRUE on success, FALSE on error.
void *dc_cg_stepreport(void *cgmeth);
Return the step report function of cgmeth (or NULL on error).
int dc_cg_set_accuracy(void *cgmeth, float accuracy);
Set and get the effective machine accuracy. The default value of 10.e-20 seems to work well.
int dc_cg_set_convergence(void *cgmeth, float convergence);
Set and get the conjugate-gradient convergence parameter (the minimization is assumed to have converged when the root mean square of the gradient vector falls below this value). Values of 0.3 and 0.2 (defaults for 1st/2nd stage) seem to work well.
int dc_cg_set_eval_limit(void *cgmeth, int eval_limit);
Set and get the limit of function evaluations. If exceeded, the minimizer gives up. Default value is 1000.
int dc_cg_set_scale_dist (void *cgmeth, float scale_dist );
Set and get error function scaling factors. These should be set to make gradient magnitudes approximately equal between functions returning results with different units.
scale_dist is applied to distance bounds violations and gradients in the standard objective function, which operates in units of normalized distance squared in the gradients and to the fourth power in the error. scale_vol is applied to volume errors (Å3) and gradients (Å2). The default values for scale_dist (1.0) and scale_vol (0.1) work well for distance geometry purposes.
scale_userfunc is used to scale the error and gradients returned by usrfunc to match those of the standard error function. If usrfunc is specified, scale_usrfunc should be set to a value appropriate to the units of the result. The default value is 1.0.
int dc_cg_set_scale_4d(void *cgmeth, float factor4d);
Set and get the scaling factor for the 4th-dimension in the standard error function. This value is defined in terms of distance along the normal 3-D axes, i.e., a value of 0.5 means that 4th-dimensional errors are scaled to one half of the others.
If set to 0.0, a 4-D minimization is done using 3-D coordinates only; atoms are free to move through each other in the 4th dimension (used for the 1st minimizer in the default Rubicon method). If set to a positive value, the 4th dimension is also minimized and coordinates are force back into 3 dimensions (0.5 used for Rubicon's default 2nd stage).
NOTE: To obtain good conformations, it is important that the final minimization is done with a positive 4D scaling factor, especially if the previous minimization allowed the conformation to move in 4 dimensions (i.e., had a 4D scaling factor of 0.0).
int dc_cg_status(void *cgmeth);
Returns termination status of last-completed minimization. The result is an integer flag defined in dc_rubicon.h. If minimization was successful, the return value will be CG_CONVERGED (0). CG_INVALID is returned on error (e.g., cgmeth is not valid method). Otherwise the return value will indicate the reason for termination e.g., CG_MAXEVAL_STOP, CG_UPHILL_STOP, etc.
int dc_cg_eval_count(void *cgmeth);
Returns number of times the objective function was evaluated, or -1 on error.
int dc_cg_iter_count(void *cgmeth);
Returns the number of iterations completed by the minimizer.
float dc_cg_best_value(void *cgmeth);
Returns the best (lowest) function value found.
float dc_cg_grad_norm(void *cgmeth);
Returns the normalized gradient root mean square.
Like any method, a Rubicon method is completely defined by attributes which are accessed via functions. In turn, the operation of Rubicon is completely defined by the method. For instance, the "trials" attribute controls how many random distance-geometry trials will be performed per invocation.
Rubicon methods have two attributes which are somewhat unusual: a rule set and a linked list of minimizers. In version 4.3, Rubicon rules that provide distance-bounds constraints are defined by the name of a file containing the rules. (Although simple and effective, this solution is inadequate when operating over networks, and will probably be changed in future releases.)
The other attribute which deserves special mention is a linked list of minimizers. Rubicon does a number of minimization steps (by default, 2) to optimize a 3 dimensional conformation before returning it. Various aspects of minimization can be controlled, e.g. the objective function to be minimized, various control parameters, etc. Each minimization is treated as a separate "minimizer method" (referred to as a "minimizer"). The desired minimizers are specified to Rubicon in the order that they are to be executed.
Allocate a default Rubicon method. See below for a list of default attributes. Method attributes can be modified via functions below. No public deallocation routine is provided in v4.3.
void dc_rube_print_method(void *rubemeth, FILE *fp);
Print all attributes of given Rubicon method rubemeth to given stream fp. Each line is preceded by len chars of prefix pre. Output includes all method header and minimizer attributes. Returns TRUE on success or FALSE on error (i.e., if argument rubemeth is not a Rubicon method).
By modifying Rubicon's minimizer method(s), one can minimize other functions or concurrently minimize additional constraint violations. For example, one might add constraints to force conformations to fit a pharmacophore model or dock a conformation into a 3-D receptor.
Customized minimizer methods are produced by modifying the default minimizer as discussed above in section 3.2.2. Three functions are provided to control which minimizers are used by a Rubicon method.
void *dc_rube_1st_minimizer(void *rubemeth);
Return the first minimizer for Rubicon method rubemeth, or NULL if none are defined. In v4.3, this should always be a CG minimizer method. Subsequent minimizers may be obtained via the function dc_cg_next_minimizer().
int dc_rube_set_minimizer(void *rubemeth, void *cgmeth);
Replace the list of minimizers for Rubicon method rubemeth with the single minimizer cgmeth, which should be a CG minimizer method. Returns TRUE on success, FALSE on error.
int dc_rube_add_minimizer(void *rubemeth, void *cgmeth);
Append CG minimizer method cgmeth to the end of the list of minimizers used by Rubicon method rubemeth. Returns TRUE on success, FALSE on error.
int dc_rube_set_rulefile(void *rubemeth, int lens, char *rulefile);
Set the rule file for Rubicon method rubemeth to rulefile (string of length lens). Returns TRUE iff successful.
char *dc_rube_rulefile(void *rubemeth, int *lens);
Return the current rule file for Rubicon method rubmeth, or NULL on error or if a rule file is not defined. The default value is $DY_ROOT/data/rubicon.rules.
int dc_rube_set_trials(void *rubemeth, int trials);
Rubicon samples trials conformations randomly. Output consists of up to nconfs conformations which meet maximum distance (mxdv) and volume (mxvv) bounds violation criteria. These functions set the number of trials, the maximum number of output conformations to output, and the acceptance criteria respectively. It is an error to set nconfs greater than trials. These functions return TRUE on success, FALSE on error.
Default values are 1 (trials), 1 (nconfs), 0.5 (mxdv) and 0.5 (mxvv).
int dc_rube_trials(void *rubemeth);
Return the number of trials, number of conformations to output, and maximum accceptable distance aand volume violations, respectively, for Rubicon method rubemeth.
int dc_rube_set_h_flag(void *rubemeth, int h_flag);
By design, Rubicon provides four ways of dealing with attached hydrogen atoms, which are represented by integer flags defined in the header file dc_rubicon.h:
DC_RUBE_H_ALL add all hydrogens before processing
In general, geometries generated for hydrogen-complete molecules are superior to those generated for hydrogen-suppressed molecules, but consume more processing time. DC_RUBE_H_ALL is the option of choice when good geometries are required and/or processing time is not an issue. DC_RUBE_H_NONE is the option of choice when fast, approximate geometries are desired. DC_RUBE_H_SOME is useful when only a few hydrogens are important, e.g. hydroxy rotors for methods which account for hydrogen bonding.
The default setting is DC_RUBE_H_ALL.
NOTE: Rubicon will only work with DC_RUBE_H_ALL if the rule set contains distance bounds constraints for hydrogens. The converse is not strictly true, e.g., using DC_RUBE_H_NONE with a rule set that contains hydrogens works correctly (but works best with a rule set tuned for hydrogen suppressed molecules).
int dc_rube_set_bump14(void *rubemeth, int bump14);
These functions set and get the bump14 attribute. When RADIUS rules are applied to the distance bounds matrix, Non-bonded 1-4 distances are subjected to van der Waals bumping constraints only if this attribute is TRUE.
int dc_rube_set_zguess(void *rubemeth, int zguess);
These functions set and get the zguess attribute. If TRUE, orientations about double bonds which are not otherwise specified are set to the presumed lower energy orientation (rel-cis in rings else biggest substituents rel-trans). If TRUE, unspecified double bond orientations are sampled.
NOTE: The zguess attribute is not implemented in version 4.3:
int dc_rube_set_savebounds(void *rubemeth, int savebounds);
These functions set and get the savebounds attribute. If FALSE, all temporary storage used by dc_rube_conformations() is deallocated before returning (the default setting). If TRUE, distance and volume bounds are save in a static area and may be retrived via dc_rube_upperbound(), dc_rube_lowerbound(), dc_rube_vbounds_reset(), and dc_rube_vbounds_next().
NOTE: To generate a bounds matrix but no conformations, use a Rubicon method with savebounds set to TRUE and trials set to 0.
DC_REAL dc_rube_lowerbound(dt_Handle a1, dt_Handle a2, int
These functions return smoothed distance bounds for the last molecule processed by dc_rube_conformations(), if the Rubicon method attribute savebounds was TRUE and it returned normally (i.e. not NULL_OB). On error, they return -1.0.
The argument *rule is set to the rule number (line number in the rule file) of the most restrictive rule that was applied to that bound. The value is set to zero if the requested bound was not set by any rule (i.e. set only by bounds smoothing; this shouldn't happen if VDW radii are properly defined).
These functions provide access to the volume constraints (as per the rule file) for the last molecule processed by dc_rube_conformations(), if the Rubicon method attribute savebounds was TRUE and it returned normally (i.e. not NULL_OB).
dc_rube_vbounds_reset() resets the list and returns the total number of number of volume constraints.
dc_rube_vbounds_next() returns the next volume constraint in the list by returning a sequence of four atoms, setting *vmin and *vmax to the constraint limits, and setting *signed to FALSE (0) if the rule is for an unsigned volume or TRUE (1) if for a signed (chiral) volume. NULL_OB is returned at the end of the list or on error.
dt_Handle dc_rube_conformations(dt_Handle mol, void *rubemeth);
Produce a sequence of conformations for the molecule mol using Rubicon method rubemeth. Returns NULL_OB on error.
Marsaglia, G. and Zaman, A., Toward a Universal Random Number Generator, Florida State University Report FSU-SCRI-87-50 (1987).
The current version of the Rubicon library has no provisions for setting values in the distance bounds matrix other than via Rubicon rules.
The current version of Rubicon samples distance space directly. The "partial metrization" method (providing improved sampling) is not implemented in this version.