Q: What was your actual throughput per hour?
A: Approximately 165 pages per hour. (initial scanning took 2 long days for 2200+ pages) After sanity checking, approximately 120 page images had to be rescanned out of a total of 2200+ page images, mostly the result of inadvertently cropped margins and other operator error)
Q: What resolution did you scan at and what format were the files stored in?
A: Images were acquired at 300 dpi as grayscale TIFF (native format for scanner)
They were converted to jpegs for web viewing.
Q: Did you find that the scanner gave you a quality image without damaging the book?
A: Yes. The Jepson Flora volumes were in generally good shape, and common enough that we were not worried if minor damage occurred. Throughput might be significantly reduced for a more rare/fragile/brittle text that requires delicate handling. The cradle with drop-down scanner seemed to be a good design for valuable texts.
We did find there to be some difference in exposure on left vs. right side pages which was never fully explained, and which led to some problems later when binarizing the images for OCR. One side would come out more exposed, which caused problems when trying to batch process all files at once.
Tweaking of settings during scanning may correct this. Quality of the originals was generally good for our purposes, which were to show the full detail of the page (including scientific illustrations.)
Ultimately we ran into the aforementioned problems with Left/Right exposure, which prevented us from uniformly correcting the exposure for all pages, that combined with a highly specialized botanical vocabulary led to OCR error rates that were unacceptable for text recognition and indexing. Lacking additional staff time to devote to this project, the electronic Jepson Flora currently stands available as an image-only product.