OCR Recommendations

dougee · August 8, 2012

We are looking at implementing an OCR solution for the scanned documents that cannot be indexed, we currently are using Adobe Acrobat Professionnal, but I wanted to see what others are doing and would recommend, thanks.

Cheers

Andy J

August 9, 2012

Hi Andy:

The government agency I work for uses Adobe Acrobat Professional to OCR before importing into Intella. We did not test other tools before settling on that though.

dougee · August 9, 2012

Thanks Kathleen, we ar ehappy with Adobe, but I wanted to see what others were using, good to know you use it. Do you use any add-ons, I was looking at a plugin for hot folder support, but didn't find anything that seemed robust enough, thanks.

August 9, 2012

Hi Andy:

We have not utilized any add-ons. I'm sorry I can't be of help on that front.

llanowar · August 14, 2012

Great topic! Thank you for binging it up.

I have used OmniPage Pro a little for OCR, seems to work well (except for original files with a large pixel sizes, such as a bunch of a CAD drawings I tried to OCR once). I believe OmniPage Pro has a min and max value pixel size sweet spot it works within. I plan to try Adobe soon.

Would someone be kind enough to elaborate a bit (or a lot on their OCR workflow. I have mulled over several options and am curious what others are doing.

Example:

1. Bring sources into Intella for processing

2. Tag the "empty" pdf, tiff, etc.

3. Export the files tagged for OCR out as native (preserving source folder structure?)

- Do you keep the folder structure (file location) intact? or just place all to-be-OCRed files lumped together in one folder?

4. OCR using favorite tool (will Acrobat Pro recursively scan files given a top-level folder?)

5. Bring results back into Intella as a new source for searching/tagging

- If an OCR-ed file is found to be responsive and needs to be produced, do you produce just the OCR-ed version or both the original (pre OCR) and the OCR-ed version (for quality assurance or other reason)?

August 22, 2012

Usually, I OCR the data set before brining it into Intella. For my report, I produce the OCR'd version, but supply the underlying dataset for discovery purposes.

dougee · August 22, 2012

Kathleen, how do you check your the dataset before bringing it into Intella for OCR? I have been playing with trying to add to this my work flow along with identifying encrypted/password protected files. Currently I use X-Ways to identify the encrypted files and decrypt them before bringing them into Intella, but couldn't seem to get it work for OCR files.

Like you I produce the OCR documents in my report and the original files in any production or discovery.

September 3, 2012

I am usually given a data set that I know ahead of time must be OCR'd. Maybe someone else on the Forum will have other ideas. I will ask around and if I find anything, I will post it.

tufelkinder · November 28, 2012

You could setup a tesseract server... Or you could use SHMSoft's processing product to do that step for you.

http://code.google.com/p/tesseract-ocr/

http://shmsoft.com/index.php/shmcloud

Sign In

OCR Recommendations

Recommended Posts

dougee

Link to comment

Share on other sites

Guest KathleenK

Link to comment

Share on other sites

dougee

Link to comment

Share on other sites

Guest KathleenK

Link to comment

Share on other sites

llanowar

Link to comment

Share on other sites

Guest KathleenK

Link to comment

Share on other sites

dougee

Link to comment

Share on other sites

Guest KathleenK

Link to comment

Share on other sites

tufelkinder

Link to comment

Share on other sites

Browse

Activity