Jump to content

OCR Recommendations


dougee

Recommended Posts

We are looking at implementing an OCR solution for the scanned documents that cannot be indexed, we currently are using Adobe Acrobat Professionnal, but I wanted to see what others are doing and would recommend, thanks.

 

Cheers

 

Andy J

Link to comment
Share on other sites

Guest KathleenK

Hi Andy:

 

The government agency I work for uses Adobe Acrobat Professional to OCR before importing into Intella. We did not test other tools before settling on that though.

Link to comment
Share on other sites

Thanks Kathleen, we ar ehappy with Adobe, but I wanted to see what others were using, good to know you use it. Do you use any add-ons, I was looking at a plugin for hot folder support, but didn't find anything that seemed robust enough, thanks.

Link to comment
Share on other sites

Great topic! Thank you for binging it up.

I have used OmniPage Pro a little for OCR, seems to work well (except for original files with a large pixel sizes, such as a bunch of a CAD drawings I tried to OCR once). I believe OmniPage Pro has a min and max value pixel size sweet spot it works within. I plan to try Adobe soon.

 

Would someone be kind enough to elaborate a bit (or a lot :) on their OCR workflow. I have mulled over several options and am curious what others are doing.

 

Example:

1. Bring sources into Intella for processing

 

2. Tag the "empty" pdf, tiff, etc.

 

3. Export the files tagged for OCR out as native (preserving source folder structure?)

- Do you keep the folder structure (file location) intact? or just place all to-be-OCRed files lumped together in one folder?

 

4. OCR using favorite tool (will Acrobat Pro recursively scan files given a top-level folder?)

 

5. Bring results back into Intella as a new source for searching/tagging

- If an OCR-ed file is found to be responsive and needs to be produced, do you produce just the OCR-ed version or both the original (pre OCR) and the OCR-ed version (for quality assurance or other reason)?

Link to comment
Share on other sites

Guest KathleenK

Usually, I OCR the data set before brining it into Intella. For my report, I produce the OCR'd version, but supply the underlying dataset for discovery purposes.

Link to comment
Share on other sites

Kathleen, how do you check your the dataset before bringing it into Intella for OCR? I have been playing with trying to add to this my work flow along with identifying encrypted/password protected files. Currently I use X-Ways to identify the encrypted files and decrypt them before bringing them into Intella, but couldn't seem to get it work for OCR files.

 

Like you I produce the OCR documents in my report and the original files in any production or discovery.

Link to comment
Share on other sites

  • 2 weeks later...
Guest KathleenK

I am usually given a data set that I know ahead of time must be OCR'd. Maybe someone else on the Forum will have other ideas. I will ask around and if I find anything, I will post it.

Link to comment
Share on other sites

  • 2 months later...
×
×
  • Create New...