configdb_worksheet.html

Daylight Summer School 1998, July 28-30, St. John's College, Santa Fe, NM

Daylight Worksheet - Cluster Package

The Cluster Package enables one to generate clusters of compounds based on the Daylight Fingerprint descriptor and the Jarvis-Patrick clustering algorithm. Subsets of large datasets can be selected as well as clustering data added to TDT files for insertion into Daylight Databases. Keep track of files from this exercise for use in Day 2 labs.

Generate a TDT file containing a clustered dataset from the ~mug/data/day1.cluster.smi dataset which uses fixed length fingerprints 5 nearest neighbors and tanimoto threshold of 0.7, and a "reasonable" JP clustering level chosen from jpscan output.
- Generate Nearnearbors Table
  - $DY_ROOT/bin/smi2tdt -t '$SMI' day1.cluster.smi day1.cluster.tdt
  - fingerprint -b 1024 -c 1024 -id day1 day1.cluster.tdt >day1.cl.fp.tdt
  - nearneighbors -fid day1 -NEIGHBORS 5 day1.cl.fp.tdt day1.cl_nn.tdt
- Choose JP level from jpscan output
  - jpscan -NN_BEST_THRESHOLD 0.7 day1.cl_nn.tdt jpscan.out
- Generate clustered output in table form with showclusters
  - jarpat -JP_NEED 3 -JP_NEAR 5 day1.cl_nn.tdt >day1.35.cl.tdt
  - showclusters -h -q -v day1.35.cl.tdt >day1.35.cl.out

Pick a representative subset of the clustered dataset from step one by selecting only the cluster centroids and the singletons.

listclusters -a day1.35.cl.tdt >day1.cl.tdt

Update the nearneighbors table generated from the day1.cluster.tdt dataset with the ~mug/data/day1.smi dataset fingerprinted with the same parameter set used in step one.

Daylight Chemical Information Systems Inc.
support@daylight.com