PyDaylight v 1.0 Python wrapper to the Daylight toolkit PyDaylight is a rapid application development library based on the Daylight chemical informatics toolkits (http://www.daylight.com/). Researchers enjoy it because it lets them implement and test new chemical algorithms in less than a third the time of traditional methods. Software developers appreciate the use of high-level constructs such as object-oriented design, iterators and exceptions and the easy integration with other Python libraries. Support staff enjoy it for its readability and its robust implementation that precludes most categories of silent errors, including type mismatches and memory leaks. The PyDaylight home page is http://www.dalkescientific.com/PyDaylight/ . Development takes place on Sourceforge and the main project page is http://sourceforge.net/projects/pydaylight/ . This distribution also includes dayswig, which creates low-level binding between the C toolkit functions and other languages, including Python, Perl and Tcl. "Daylight", "Daylight toolkit", "Thor" and "Merlin" are registered trademarks of Daylight Chemical Information Systems. ADVERTISEMENT: This package is developed and maintained by Dalke Scientific Software, LLC. In partnership with Mesa Analytics and Computing, LLC (http://www.mesaac.com/) we provide consulting, customization, support and training for PyDaylight and chemical informatics. For more information, please visit http://www.dalkescientific.com/ or send email to info@dalkescientific.com. WHAT'S NEW: Read the "NEWS" file in this directory. FEATURES: - Toolkit objects - including molecules, fingerprints, reactions and SMARTS and SMIRK queries - are first class Python objects with methods and attributes. - Full support for Thor and Merlin databases, including an MCL to PyDaylight compiler. - Depictions rendered in Qt, PDF, Tk and PIL, the Python Imaging Library. - Client and server libraries for program objects. - Toolkit streams and sequences are converted to native Python lists and iterators. - Function calls are checked and any error return values are converted to Python exceptions. - Automatic garbage collection. The result is an interface layer that looks and acts like a native Python library. This reduces the learning curve and the time needed to write productive software. REQUIREMENTS: This package requires that you have a licensed version of the Daylight toolkit (tested with 4.8x and 4.91) and a Python version (tested with 2.4). I've built it under Red Hat Linux Enterprise 2.1, Solaris 7, and IRIX 6.5. It has also been tested on later Solaris and Red Hat Linux versions as well as Mac OSX 10.4. PyDaylight supports several optional components. If VMD is installed it may be used to view coordinate information stored in TDTs. There is some support for PDF, Tk, and PIL if those libraries are also installed. For more information about: See: Python http://www.python.org/ Daylight toolkits Daylight CIS, http://www.daylight.com/ VMD http://www.ks.uiuc.edu/Research/vmd/ ReportLab PDF library http://www.reportlab.com/ PIL Secret Labs AB, http://www.pythonware.com/ Tk the Python installation instructions INSTALLATION: Although DaySwig is included in this distribution you should not need to use this code unless you want to build everything from scratch. The final .c file that is created using DaySwig is included in the distribution. Make sure the DY_ROOT environment variable is set, then Either run the following to build and install as root in one step: su - root python setup.py install Or first compile under your account then install as root: python setup.py build su - root python setup.py install The "install" command takes --prefix parameter if you want to install the code elsewhere. For example sudo python setup.py install --prefix $HOME/local For more information about using Distutils, see http://python.org/sigs/distutils-sig/doc/ PROBLEMS: Here are some of the problems you may run into, and how to fix them. - WARNING: Environment variable DY_ROOT not set You should set the DY_ROOT environment variable to point to the local Daylight installation. This is neeed to find the header files and libraries. If not set, setup.py willl try to find the installation. - "WARNING: Found Daylight installation at /usr/local/daylight/v491" The DY_ROOT environment variable wasn't set but setup.py found what appears to be a Daylight installation at the given directory. You really should set the DY_ROOT variable correctly. - "WARNING: No installation found, using /usr/local/daylight/v491" The DY_ROOT environment variable wasn't set and setup.py couldn't find a Daylight installation. The default value is used because setup.py can also be used to generate source distributions and queried for configuration information - neither of which require a Daylight installation. - "dayswig/dayswig_python.c:492: dt_mdep.h: No such file or directory" The value of DY_ROOT (or the directory that was guessed for the Daylight distribution) was incorrect, so the C compiler cannot find the Daylight header files. Set DY_ROOT to the correct location. Once installed, try the following to see if the installation works. Start Python then run the following commands. >>> from daylight import Smiles >>> mol = Smiles.smilin("c1ccccc1O") >>> print len(mol.atoms) 7 >>> Here are the possible failures and ways to fix it. - "ImportError: No module named daylight" This means the directory named 'daylight' is not on the PYTHONPATH. Possible solutions: o need to install PyDaylight, as in "python setup.py install" o if you installed to a non-standard directory (like using a prefix of "$HOME/local") then you'll need to set the PYTHONPATH environment variable to include the directory containing 'daylight/'. Under Python 2.1 with a csh-derived shell (like tcsh) do: setenv PYTHONPATH $HOME/local/lib/python2.1/site-packages - "ImportError: ld.so.1: python: fatal: libdt_thor.so: open failed: No such file or directory" The Daylight shared library files are not on the LD_LIBRARY_PATH environment variable. You will need to add "$DY_ROOT/lib" to that variable. (IRIX users may need to be aware of ABI issues. Consult 'man rld' and http://www.daylight.com/dayhtml/doc/release_notes/readme_v471_sgi.html for details.) - "* FATAL ERROR -- UNKNOWN OPTION: "LICENSEDATA" " The DY_LICENSEDATA environment variable isn't set. You will need to point it to the license file sent to you by Daylight. - "Invalid license ..." The license file referenced in DY_LICENSEDATA is not valid. You will need to get a valid file from Daylight. BUILDING dayswig_python (manual build - not needed for standard PyDaylight installation): This distribution includes a pregenerated dayswig/dayswig_python.c file, built using the 4.72 toolkit headers. dayswig/README has some information about what's needed to create that file. In brief, you will need a copy of SWIG installed (which requires C++) and you will need a copy of Tcl. Then do "make python" in the dayswig directory. PyDaylight COMPONENTS: Smiles -- create a Molecule or Reaction Molecule -- access molecular information, including Atom -- access atom information Bond -- access bond information Cycle -- access cycle information Element -- a helper module (not part of the Daylight distribution) containing information about the elements Fingerprint -- access fingerprint information Depict Callback -- base class defining the depiction callback interface Colors -- color definitions PDFCallback -- callback for PDF depictions via ReportLab PILCallback -- callback for image depictions via PIL TkCallback -- callback for Tk depictions VectorFont -- (under development) convert font callbacks to use a a vector font Reaction -- adds the reaction specific methods to a molecule Rubicon -- interface to the "rubicon" binary, including the ability to run rubicon on a remote machine (for distributed computing) Smarts -- run a SMARTS query on a molecule or reaction Thor -- talk to the Thor server and get/modify TDTs Merlin -- talk to the Merlin server and run various queries ThorGraph -- graph and tautomer searches using Thor MCL -- an MCL to PyDaylight compiler Grid -- makes a 2D grid interface to a hitlist/columns TDTTraverse -- generic driver for converting TDTs to text, HTML, etc. Program -- client-side interface to program object Pipetalk -- client and server-side interfaces to program objects. This code implements the pipetalk protocol and does not use the toolkit Some of the Thor and Merlin support libraries are not needed by user code. These are: Capability -- map between server and client functionality Column -- class interface for a column of the Merlin database Database -- base class for Thor database and Merlin pool interfaces Datafield -- class interface for the 'datafield' type Dataitem -- class interface for the 'dataitem' type Datatree -- class interface for the 'datatree' type Datatype -- class interface for the 'datatype' type FieldType -- class interface for the 'fieldtype' type Hitlist -- class interface for the 'hitlist' type Message -- helps send messages and eviction notices to the server Password -- manipulates server and database passwords Server -- base class for Thor and Merlin server interfaces Task -- standard framework for Merlin search and sort tasks In addition, the following packages are available, but do not directly use the toolkit: dayencodings -- convert to/from Daylight's "fingerprint" encoding (NOTE: given a string 's' you can do s.encode('daylight-fp') and s.decode('daylight-fp') after importing 'daylight') TDT support -- read and write TDTFiles and parse the information based on the TDT database definition (lightly tested). This is a pure reimplemention of a TDT parser. It needs to be rewritten. VMD -- use VMD to view TDTs The 'Hydrogens' and 'MDLMol' modules are deprecated and will be turned into examples. BACKGROUND: Programming the Daylight toolkit in C can be frustrating: When do you dealloc a sequence? Which functions are applied to which objects? When should you check for errors? In most cases you are trying to get something working and don't want to go through the hassles of remembering all of the details of how the API works -- you want to do science. It would be nice if the system took care of all that for you, especially in an easy scripting language that will let you test things out and correct mistakes quickly without needing to recompile. This package does just that. EXAMPLE: Here's a version of cansmiles. It reads SMILES strings from standard in and sees if the string can be converted into a molecule and prints the canonical form. If '-i' is given on the command line, print the canonical SMILES in isomeric form. = = = = = import fileinput, sys, string import daylight from daylight.Smiles import smilin isomeric = 0 if sys.argv[1:2] == ["-i"]: isomeric = 1 del sys.argv[1] bad = 0 count = 0 for line in fileinput.input(): info = string.split(line) if len(info) < 1: continue count = count + 1 smi = info[0] try: mol = smilin(smi) except daylight.BadFormat, msg: if msg is None: msg = "Daylight says so" print "Cannot parse", `smi`, "because", msg bad = bad + 1 continue print mol.cansmiles(isomeric) if count != 0: print count, "SMILES read with", bad, "bad terms:", \ (bad*100)/count, "% failure" = = = = = And the following finds how many unique times a given pattern occurs occurs in the input. = = = = = import fileinput, sys from daylight import Smiles, Smarts # Get and remove the SMARTS pattern from the first command line parameter pattern = sys.argv[0] smarts = Smarts.compile(pattern) del sys.argv[0] count, num = 0, 0 for line in fileinput.input(): line = line[:-1] # remove the newline mol = Smiles.smilin(line) match = smarts.umatch(mol) count = count + len(match) num = num + 1 print "The pattern", pattern, "matched", count, "times out of", num = = = = = PERFORMANCE: This release has a few speed optimizations. Because it is a interpreted wrapper around the SWIG wrapper of the C API there is a some overhead. The tests I did show that, as a very rough gauge: - dayswig_python adds little overhead to the toolkit functions (eg, cansmi at the dayswig_python level is slightly slower than the contrib version) - PyDaylight adds a factor of two overhead to the toolkit (the standard PyDaylight cansmi is 60% slower than the contrib version, while an optimized, less readable version is about 40%) - Implementing logic in Python (ie, more than just toolkit functions) has about a factor of 6 overhead. (Note: the above two examples are optimized for clarity, not speed.) DOCUMENTATION: *Sigh*. This is almost invariably the part of a distribution that needs the most work. Many of the modules and functions/methods have doc strings embedded in them. I have yet to get the program that converts that documentation into standalone form. I've included a very rough draft (about 20 pages) of the architecture, in doc/. Links to all the PyDaylight documentation may be found at http://www.dalkescientific.com/PyDaylight/ . LICENSE: This package is distributed under the GNU Public License http://www.gnu.org/copyleft/lgpl.html , specifically: Copyright (C) 1998-1999, Bioreason, Inc. Copyright (C) 1999-2000, Andrew Dalke Copyright (C) 2000-2005, Dalke Scientific Software, LLC Copyright (C) 2000, Vertex Pharmaceuticals, Inc. This library is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You should have received a copy of the GNU Library General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. = = = BUGS: I know the TDT file reader needs work as what is there does not properly support the Daylight quoting rules for data elements. The Molecule, Atom and Bond parts have been heavily used internally and have been extensively reviewed. I expect there to be few problems with them. In fact, they even uncovered bugs in the Daylight toolkit itself. The database components have barely been tested. When writing up my presentation on the system, I found a couple of bugs, which indicates that several more are likely still hiding out. PyDaylight development, testing, documentation and training is currently done on a consulting basis. If you are interested in hiring us, please contact info@dalkescientific.com. DEMOS: I put a few demos in the "examples" directory, including mcl.py, an MCL to PyDaylight converter. More examples are available at http://www.daylight.com/meetings/mug2000/Dalke/overview/ Someday these will be added to the regular distribution, along with regression tests. Andrew Dalke dalke@dalkescientific.com