ABOUT THE HTML VERSION

The Indian Ocean Catalogue has been published by the University of California Publications in Botany.

This Web version is generated by filtering some 75 files marked up for the troff typesetting program. A raw file looks like this. The filter, which translates troff codes to HTML markup and produces a variety of indexes and other pages, consists of several perl scripts.

At the center of the electronic catalogue is a 3.4 MB file consisting of all the catalogue entries, in the order in which they appear in the printed version, with one entry per line. Each entry is marked up with HTML and preceded by a four digit number (quad) with the first entry getting 0001 and the last 3354: i.e. the lines are in sorted numerical order. An entry marked up for HTML looks like this.

Since the main file is sorted, entries can be retrieved by binary search using perl's look function. Retrieval times are the same for all entries. A similar file holds the bibliographic references. Retrieval and delivery of the entries is controlled by several perl CGI scripts. The main script, (getent), takes a quad or string of quads as input, looks up the associated lines, and surrounds the returned string with the HTML tags necessary to output a Web page.

The various indexes (geographic, phylogenetic, alphabetic) consist of names together with the associated quad packaged in the appropriate GET format. That is, index lines have the form:
<a href = "cgi-bin/getent?2961">Ventricaria ventricosa</a>
Selecting this link initiates the following sequence:

  1. Line 2961 is retrieved from the main dictionary
  2. The preceding number and the following three are used to retrieve the names of the neighboring entries from a DBM file bound to an associative array.
  3. All the records are excised. They can be restored later by the user who is interested in them.
  4. Addenda (material added after the catalogue went to press) are added if necessary.
  5. Everything is wrapped in HTML code that permits the entry to be displayed as a single WWW page with links to preceding and following pages (not yet generated), to searching scripts, and to the table of contents.

How are citation records removed?
The citation records, which form the greater part of the printed catalogue, are flanked by special markers so that getent can delete them or include them as appropriate. The records by default are not displayed since that results in a smaller and less distracting page. On each recordless page there is a link that will reload the page without excising the records.
How will the catalogue be updated?
We will be depending on notification from users to call our attention to necessary addenda.
How are addenda processed?
The catalogue form is frozen as of the version that was sent to press. Changes and updates are being included in a file that consists of taxonomic names and update information. As getent loads, it reads this file into an associative array, with the quad representing a name being the hash key and the HTML coded update information being the value. When getent is assembling the page an addendum comment string is inserted if one is available (see Coelarthrum boergesenii for an example).
Why was it done this way?
First of all there was a constraint resulting from the form of the input: troff markup. Second, we wanted the WWW version to mirror the printed version so that we could get feedback that we could incorporate in print. [Thanks a lot to those who used the electronic version to improve the printed product.] Third, we wanted to reduce loading time and other transit delays. Fourth, we did not want to restrict accessibility to only certain browsers. There are no images (well, one map), no browser-specific constructs (except mailto) and getent, the central CGI script, hobbles itself by avoiding frames, which are not (were not, anyhow) handled by all browsers.