So now I've OCRed the pages, and I now I have to fix that. Some of the pages are pretty good, and only take 5 or minutes or so to spell check. The most common issues are lower-case Ls and capital Is being recognized as the number "1," or as exclamation points. In contrast to this are the pages that are poorly copies, and use small, funky fonts, and I get crap like this (note, this is actually copied directly from a paragraph that I have yet to spell check):
"H rearm anlMaty apprcprLBle Irtal Williams lurrnd on io anirfa-lion al age Swb ntar lacing Snow rtfitfa irtd tna Senm t>wtlfl F4&H J4 r* li Hii dra »*4 By mc« raponiHl at "boyish.' dw in pan |D rul -nletliOLn enimiuasfn lor ta wort Born "1 TotWfc wtvE hia met**' Trtxhod al ¦« HuatrBbt and Kri IVrtot *ta* ¦ Cinhmercal "rmf. William? K"jfc lo Irtfl OrawinQ bcvrQ V an eH'ly aga and wn> ImtKmJ by lid Disney chB/aclflf* AF 14. Tie mi»fl ¦ pihjmnflQd [Pj Thi^) Id Ihfl Oi*nay studia Vi Bur bank. CaJilo*r*ir and Ihtuugrt a FlVnd dF hil mo[r>wrE mam9«tf ID gam mjrfW-lariCB Ha «poX* 10 IrtB Di*r*y ¦n.naicr? i*lwis* wort r<e riad flffmk*!. Bjid Ihey in ftun vrero Impaiawf *lh lr« younpHflr"? .JadkiaHofl and Ulant ¦ Loll Df peop* al ¦» Ikne *H'a Dh"*T ivi«dcir' I* Of» I0U ar> 4nlBiviai-flr, "ftiil I cduM *clij#lty drt» aHaFir"
I shit you not. Those first three 'words' (H rearm anlMaty) should read "It seems entirely". I think I see five words in that mess that actually OCRed correctly.
Ahh, the joys of OCRing shit...