dougee Posted August 8, 2012 Report Share Posted August 8, 2012 We are looking at implementing an OCR solution for the scanned documents that cannot be indexed, we currently are using Adobe Acrobat Professionnal, but I wanted to see what others are doing and would recommend, thanks. Cheers Andy J Link to comment Share on other sites More sharing options...
Guest KathleenK Posted August 9, 2012 Report Share Posted August 9, 2012 Hi Andy: The government agency I work for uses Adobe Acrobat Professional to OCR before importing into Intella. We did not test other tools before settling on that though. Link to comment Share on other sites More sharing options...
dougee Posted August 9, 2012 Author Report Share Posted August 9, 2012 Thanks Kathleen, we ar ehappy with Adobe, but I wanted to see what others were using, good to know you use it. Do you use any add-ons, I was looking at a plugin for hot folder support, but didn't find anything that seemed robust enough, thanks. Link to comment Share on other sites More sharing options...
Guest KathleenK Posted August 9, 2012 Report Share Posted August 9, 2012 Hi Andy: We have not utilized any add-ons. I'm sorry I can't be of help on that front. Link to comment Share on other sites More sharing options...
llanowar Posted August 14, 2012 Report Share Posted August 14, 2012 Great topic! Thank you for binging it up. I have used OmniPage Pro a little for OCR, seems to work well (except for original files with a large pixel sizes, such as a bunch of a CAD drawings I tried to OCR once). I believe OmniPage Pro has a min and max value pixel size sweet spot it works within. I plan to try Adobe soon. Would someone be kind enough to elaborate a bit (or a lot on their OCR workflow. I have mulled over several options and am curious what others are doing. Example: 1. Bring sources into Intella for processing 2. Tag the "empty" pdf, tiff, etc. 3. Export the files tagged for OCR out as native (preserving source folder structure?) - Do you keep the folder structure (file location) intact? or just place all to-be-OCRed files lumped together in one folder? 4. OCR using favorite tool (will Acrobat Pro recursively scan files given a top-level folder?) 5. Bring results back into Intella as a new source for searching/tagging - If an OCR-ed file is found to be responsive and needs to be produced, do you produce just the OCR-ed version or both the original (pre OCR) and the OCR-ed version (for quality assurance or other reason)? Link to comment Share on other sites More sharing options...
Guest KathleenK Posted August 22, 2012 Report Share Posted August 22, 2012 Usually, I OCR the data set before brining it into Intella. For my report, I produce the OCR'd version, but supply the underlying dataset for discovery purposes. Link to comment Share on other sites More sharing options...
dougee Posted August 22, 2012 Author Report Share Posted August 22, 2012 Kathleen, how do you check your the dataset before bringing it into Intella for OCR? I have been playing with trying to add to this my work flow along with identifying encrypted/password protected files. Currently I use X-Ways to identify the encrypted files and decrypt them before bringing them into Intella, but couldn't seem to get it work for OCR files. Like you I produce the OCR documents in my report and the original files in any production or discovery. Link to comment Share on other sites More sharing options...
Guest KathleenK Posted September 3, 2012 Report Share Posted September 3, 2012 I am usually given a data set that I know ahead of time must be OCR'd. Maybe someone else on the Forum will have other ideas. I will ask around and if I find anything, I will post it. Link to comment Share on other sites More sharing options...
tufelkinder Posted November 28, 2012 Report Share Posted November 28, 2012 You could setup a tesseract server... Or you could use SHMSoft's processing product to do that step for you. http://code.google.com/p/tesseract-ocr/ http://shmsoft.com/index.php/shmcloud Link to comment Share on other sites More sharing options...
Recommended Posts