Daylight v4.9
Release Date: 1 February 2008

Name

mergeneighbors - merge nearneighbors output files

Unix Synopsis

mergeneighbors [options] in.tdt [in.tdt ...]

Description

mergeneighbors(1) is a utility designed for the post-processing of the output of nearneighbors. Mergeneighbors takes as input a variable number of nearneighbors lists (which must have been generated from the same input fingerprint file) and attempts to merge them into a single, complete output file which contains the neighbors-lists found in any of the input files.

The first input file given as a parameter is special: it is the reference file which is used to create the output. The tdts in the reference file are read in order. If the tree contains a fingerprint (with the correct id), the program attempts to build a nearest neighbors list by scanning the input files. The completed tree, with nearest neighbors data, is then written to standard output.

If nearest neighbors data is not found for a tree (or range of trees), mergeneighbors issues a warning. The warning can be a text message, or a script file (see -NN_SCRIPT_FILE). Mergeneighbors can only sense the presence or absence of nearest neighbors data, not if the data is complete. Use care when merging output from nearneighbors using the -DO_INPUT and -SKIP_INPUT options to make sure complete neighbors lists are written. No warning will be given if incomplete data is provided to mergeneighbors.

Mergeneighbors can handle different sized neighbors lists in each file. Mergeneighbors only reads the first `nnear' (see -NEIGHBORS and -MAX_NEIGHBORS) neighbors from each list.

The reference file generally should be the input fingerprint file (used as input for nearneighbor).

Options

-NNID runid
Identify this run by `runid' in $NNG and NN output data. (-id)
-FID fpid
Use only fingerprints identified by `fpid' rather than the first one encountered in each tree. This is chiefly useful for testing: in normal use, there is usually only one fingerprint per tree. (-in)
-NEIGHBORS nnear
Specify the length of nearest neighbor lists to be generated. If an input file was generated with more neighbors in the list, the additional neighbors are ignored. If an input file was generated with less neighbors in the list, the list is padded with entries with a similarity of 0.0.
-MAX_NEIGHBORS nnear
Specify the maximum length of nearest neighbor lists to be generated, including any tied neighbors at the final position in the list. If an input file was generated with more neighbors in the list, the additional neighbors are ignored. This must be larger than -NEIGHBORS.
-NN_MERGE_LISTS [TRUE|FALSE]
Controls how mergeneighbors handles fingerprints which have multiple nearest neighbors lists in the input files. If TRUE, one neighbors list is created by merging each of the individual lists (excluding duplicates). This is extremely useful when the -DO_INPUT or -SKIP_INPUT options were used on nearneighbors. If FALSE, the first nearneighbors list found (searching the input files in order, including the reference input file) is used and all others are discarded.
-NN_SCRIPT_FILE file
This option specifies a file in which to write warnings about missing nearest neighbors lists. The format of the file is a shell script. Execution of the script will calculate the missing neighbors lists, and re-execute the merge step to create a final output file.

Note that the script assumes that the first input file given to mergeneighbors is the input fingerprint file used for the calculation of all of the neighbors lists. Although the script attempts to preserve the options used by mergeneighbors, -NUM_PROCESSES is not set.

-COMPARISON [DISTANCE|SIMILARITY]
Controls relative goodness of similarity comparisons for list merging/sorting and tie-handling. SIMILARITY means that higher values are better; DISTANCE means that lower values are better. (Default: SIMILARITY)

Return Value

Returns 0 to its environment on success, or 1 on error, in which case a diagnostic message is printed:

mergeneighbors: input file not specified

An input file was not specified on the command line.
mergeneighbors: can't open input file
The input file specified on the command line does not exist or is not readable.
mergeneighbors: can't open script file
The output script file could not be created.
mergeneighbors: problem with option manager
The option manager could not be initialized. Verify that DY_ROOT is set properly.
mergeneighbors: out of memory
The program was not able to allocate enough virtual memory to run the specified problem.
mergeneighbors: (ouch!) old file missing NN dataitem
mergeneighbors: bad format in NN<;list;>
mergeneighbors: bad value (unexpected '.') in NN<;list;>
mergeneighbors: bad value (can't sscanf) in NN<;list;> The program could not successfully read the neighbors list. Either a list was missing, the format was bad, or the list has a different number of neighbors than expected.

Files

$DY_ROOT/bin/mergeneighbors

Daylight License

programs: cluster

Related Topics

fingerprint(1) jarpat(1) jpscan(1) listclusters(1) nearneighbors(1) showclusters(1) licensing(5)
Daylight Theory Manual

Bugs

None known.