Daylight Summer School 1998, July 28-30, St. John's College, Santa Fe, NM
Daylight Worksheet - Cluster Package
The Cluster Package enables one to generate clusters of compounds based
on the Daylight Fingerprint descriptor and the Jarvis-Patrick clustering
algorithm. Subsets of large datasets can be selected as well as
clustering data added to TDT files for insertion into Daylight Databases.
Keep track of files from this exercise for use in Day 2 labs.
- Generate a TDT file containing a clustered dataset from
the ~mug/data/day1.cluster.smi dataset which uses fixed length fingerprints
5 nearest neighbors and tanimoto threshold of 0.7,
and a "reasonable" JP clustering level chosen from
jpscan output.
- Generate Nearnearbors Table
- $DY_ROOT/bin/smi2tdt -t '$SMI' day1.cluster.smi day1.cluster.tdt
- fingerprint -b 1024 -c 1024 -id day1 day1.cluster.tdt >day1.cl.fp.tdt
- nearneighbors -fid day1 -NEIGHBORS 5 day1.cl.fp.tdt day1.cl_nn.tdt
- Choose JP level from jpscan output
- jpscan -NN_BEST_THRESHOLD 0.7 day1.cl_nn.tdt jpscan.out
- Generate clustered output in table form with
showclusters
- jarpat -JP_NEED 3 -JP_NEAR 5 day1.cl_nn.tdt >day1.35.cl.tdt
- showclusters -h -q -v day1.35.cl.tdt >day1.35.cl.out
Pick a representative subset of the clustered dataset from
step one by selecting only the cluster centroids and the singletons.
- listclusters -a day1.35.cl.tdt >day1.cl.tdt
Update the nearneighbors table generated from the day1.cluster.tdt
dataset with the ~mug/data/day1.smi dataset fingerprinted with the
same parameter set used in step one.
Daylight Chemical Information Systems Inc.
support@daylight.com