California oak woodland - Photo credit: John Game


Features of Consortium of California Herbaria Data View

See also help page and "about" page

Input

  • Participant data: acceptable in any field format, converted to common Consortium format; file format xml or tab-delimited preferred
  • Minimum standards: Unique herbarium specimen number, taxon name
  • Transmitted by e-mail, CD, FTP, memory stick
  • Updatable at any interval

Common Consortium format

  • Produced by scripts constructed for each participant
  • Fields accommodated:
    FieldSearchable?Sortable?
    Unique herbarium specimen number**
    Taxon name**
    Locality**
    Habitat
    Macromorphology
    Notes
    Latitude*
    Longitude*
    Township/range/section
    Elevation (f or m)*
    Date**
    Collector**
    Collector number**
    County**
    Annotations
    Type information

Restrictions/Quality Control

  • Names not in master taxon authority file excluded; Authority table updated as necessary, following consultation with IPNI, Tropicos, ICPN
  • Common spelling of taxon names enforced
  • Common spelling of county names enforced
  • Coordinates outside California box nulled
  • Dates before 1750 and after today's date nulled
  • Field contents sanity checked, record skipped on mismatch
  • Elevations below Death Valley or above county max elevation nulled
  • Any spelling of localities, collectors allowed
  • Any synonymy allowed
  • Errors, modifications reported to participant

Data updates

  • Consortium contents completely replaced at refresh (ca. every one to two weeks)
  • Participant data replaced en masse (i.e., not corrected or incremented) when new data received

Change tracking

Curatorial features

  • "Heteroduplicate" specimen tracking
  • Canned common searches ("related searches")
  • Feedback (cumulative comments page for each participant), displayed per-record, or over current intervals.
  • Comparative graphs (county vs specimen numbers per participant)
  • Comparison of specimen coordinates with range specified for taxon in Jepson Manual (yellow flagging)

Mapping

  • Connected to BerkeleyMapper for point display

Online constituents

Software, hardware

All construction and retrieval done with Perl scripts; Perl available free for all operating systems. Data stored with BerkeleyDB database library. System has run unmodified on Windows NT, Mac OSX, and Linux, all using Apache webserver. The data, indexes, scripts, etc., for the ~2,000,000 records currently being displayed occupy ~2 GB. Refreshing the entire system from common format files requires ~15 minutes of processing time on a 2.6 GHz Mac mini.
Databases Main file: Herbarium number -> corresponding data
Indexes Various: Index item -> list of Herbarium numbers
Translation tablestaxon name <-> taxon ID nomenclatural synonyms
Link tables specimen number -> external links (e.g., specimen scan, fieldbook entry, collector biography, user comments)
CGI scriptssearching; commenting
Static web pagesSearch form; administrative pages; graphs