6. Basics: Streams and Sequences

Back to Table of Contents

It is often useful to perform some operation iteratively over the constituent parts of an object; for example, one might want to examine the properties of each atom or bond of a molecule. Two special object types, the stream and the sequence, provide a mechanism for doing this conveniently. Conceptually, a stream or a sequence is an ordered group of objects with a current position in the order.

(Note: To clarify this concept, it is not the same as a set, since one object can appear several times; nor is it like LISP's list, as sequences can't be appended to one another, and there is no concept of the "tail" of a sequence being a valid sequence.)

6.1 Properties

6.1.1 Stream Properties

Streams are used to enumerate the constituent parts of complex objects such as molecules, and are often the only way these constituent parts can be accessed. For example, if m is a molecule object, invoking

	atoms = dt_stream(m, TYP_ATOM)
returns a stream containing all of the atoms in the molecule.

Streams are deliberately limited in their capabilities in order to make them "cheap" (creating a stream of atoms as illustrated above takes very little computing time). In addition to the polymorphic functions that apply to all objects (described in the chapter entitled POLYMORPHIC FUNCTIONS), there are only four operations on streams:

  • create a stream
  • reset the stream's position to the beginning
  • get the next item in a stream
  • ask if the stream's position is at its end

Streams usually have a base object -- the object from which they are derived (see dt_base()). Most streams have one base object, and are deallocated if the base object is changed in a way that makes the stream invalid. For example, a stream of atoms from a molecule is deallocated if a new atom or bond is added to the molecule.

Streams have several important properties:

  • dt_next(s) always returns objects in the same order. That is, you can step through the stream, reset it, and step through again with the same results.

  • Two streams of the same type with the same base object will both return their objects in the same order.

  • A copy of a stream behaves identically to the original. This is true even when a copy is made in the middle of an enumeration; in this case dt_next(copy) will continue the enumeration in the middle of the stream exactly as dt_next(orig) will.

  • A stream is deallocated (the stream-object is thrown away and its handle revoked) if the base object (the object from which it is derived) is modified. For example, an atom-object stream is deallocated when the molecule containing the atoms is deallocated or structurally modified.

6.1.2Sequence Properties

The properties of a sequence are a superset of stream properties; in addition to those listed above, sequences can perform the following operations:

  • ask if the sequence is at its beginning
  • insert an object at the current location
  • delete the object from the current location
  • add an item to the end of the sequence
  • go directly to the end of the sequence

6.1.3 Example

The following short code fragment illustrates how one might create both a stream and a sequence. Both the stream and the sequence will contain all the atom-objects from the molecule, but the sequence can later be modified if desired (the function dt_smilin() is documented in a later chapter).

     #include <dt_smiles.h>
     ...
     char smiles[20];
     dt_Handle strm, seq, mol, atom;

     /**** create a stream of the atoms in benzene ****/
     strcpy(smiles, "c1ccccc1");
     mol = dt_smilin(strlen(smiles), smiles);
     strm = dt_stream(mol,TYP_ATOM);

     /**** copy the atoms from the stream into a sequence ****/
     seq = dt_alloc_seq();
     while (NULL_OB != (atom = dt_next(strm)))
       dt_append(seq, atom);
     ...
Several other example programs that make use of sequences are supplied in the Daylight "contrib" directory ($DY_ROOT/contrib).

6.2 Functions on Streams and Sequences

dt_alloc_seq() => sequence
Return a new, empty sequence.

dt_stream(Handle ob, integer typeval) =>stream
Returns a stream (an enumeration) of all parts of type typeval within the object ob.

dt_next(Handle s) => object
Return the next object in the sequence or stream. Return NULL_OB if all items of the stream or sequence have already been returned.

dt_atend(Handle s) => boolean
Returns TRUE if the most recent call to dt_next(s) returned NULL_OB because the end of the stream or sequence was reached. This is useful in cases where a sequence might contain NULL_OB as a valid item.

If dt_next() has not yet been called for the given sequence or stream, or has not been called since the last call to dt_reset() (or any other function that resets the sequence), dt_atend() will return FALSE, even if the sequence or stream is empty. Note that if the most recent call to dt_next() returned something other than NULL_OB, then dt_atend() will necessarily return FALSE.

dt_reset(Handle s) => boolean
Resets the sequence or stream so that it begins again with the first item.

6.3 Functions on Sequences Only

The following functions only apply to sequences; they modify a sequence or otherwise perform "direct access" to the objects it contains.

These functions use the concept of a current object. The current object is the one that was most recently returned by dt_next(). When dt_reset() is called, the current object is thought to be an imaginary object before the first actual object; if the dt_next() reaches the end of the sequence, the current object is thought to be an imaginary object after the last object of the sequence.

dt_append(Handle seq, object ob) => sequence
Adds ob to the end of seq; ob may be any valid Handle to an object including the value NULL_OB. Return the (modified) sequence. This function also resets the sequence; that is, it has the effect of dt_reset().

Note that, like dt_insert(), it is permissible to add NULL_OB to a sequence; the NULL_OB handle takes a spot like any other handle. When NULL_OB is an item in a sequence, dt_atend() is required to distinguish between the NULL_OB returned when the end-of-sequence is reached and the NULL_OB handles that are part of the sequence.

dt_atstart(Handle seq) => boolean
Returns TRUE if the sequence is reset (if the next call to dt_next(seq) would return the first object in seq).

dt_delete(Handle seq)
Deletes the current object of the sequence (i.e. the last one returned by dt_next()); make the object that preceded the deleted object be the new current object. dt_next() will return the same value it would have before the deletion occurred.

dt_insert(Handle seq, object ob) => sequence
Inserts ob in the sequence before the current object; the newly- inserted object becomes the current object.

The new object is inserted before the object most recently returned by dt_next(), or at the start of the sequence if dt_reset() was the last operation, or at the end of the sequence if dt_atend() would return TRUE.

Note that dt_next() will return the same value it would have before the insertion occurred. A sequence that is reset when an insertion is made is not reset after the insertion, since the newly-inserted object becomes the current object.

It is permissible for ob to be NULL_OB; in this case, the NULL_OB handle takes a spot in the sequence. When NULL_OB is an item in a sequence, dt_atend() is required to distinguish the NULL_OB returned when the end-of-sequence is reached from NULL_OB handles that are part of the sequence.

dt_toend(Handle seq)
Go to the end of the sequence; the current object becomes NULL_OB, and is imagined to be after the last object in the sequence. The next invocation of dt_next() will return NULL_OB, and dt_atend() will return TRUE.

6.4 An Example

The following C function returns the number of protons in an object. Some of the power of streams and the object-like approach is illustrated by the fact that object can be a molecule, atom, bond, or cycle. In each case, the number of protons associated with the object will be returned. For example, for an atom, the number of protons in the atom will be returned; for a bond, the sum of the protons in the atoms it joins will be returned; for a cycle, the sum of the protons in the atoms of the cycle will be returned; for a molecule, the number of protons in the whole molecule will be returned. Note also that if ob is not of an appropriate type (e.g. NULL_OB, or perhaps a string object), the function will correctly and quickly return zero.

     proton_count(dt_Handle ob)
     {
	     dt_Handle atoms, atom;
	     int pcount;

	     atoms = dt_stream(ob, TYP_ATOM);
	     pcount = 0;
	     while (NULL_OB != (atom = dt_next(atoms)))
		     pcount += dt_number(atom) + dt_imp_hcount(atom);

	     dt_dealloc(atoms);
	     return pcount;	/* return result */
     }
Back to Table of Contents
Go to previous chapter Basics: Strings and Number Objects
Go to next chapter Molecules.