wmfiske Posted January 28, 2016 Report Share Posted January 28, 2016 I would like to open a community discussion on OCR settings and programs as I have been doing some performance testing recently. There are two versions of ABBYY that I have been testing: FineReader Corporate (4 core) and Recognition Server (RS v4). My first assumption was that RS v4 would be faster since it is 4-5x the cost of the 4-core Corporate version. I was using an unlimited core version and I liked the idea that I could export/import files directly from Intella v1.9. In one test, I sent 100 non-searchable PDF files to RS using the Intella interface. I preconfigured a workflow in RS to export to Text format. The PDF files were random sizes, 4 had errors (corrupted) and they totaled 1,067 pages. TEST #1 (Good): RS server, which was running on a separate server than Intella, completed the task in 26 minutes. (Note: One downside to using the Intella interface to export/import to RS was I could not use Intella while it was processing) TEST #2 (Better): Corporate, which was running the Hot Folder function on a separate server, completed the task in less than 19 minutes. The output and other settings was equivalent to the RS workflow. TEST #3 (Best): I then wanted to figure out a way to squeeze more performance from Corporate Hot Folder. I created a batch file that split my PDF files into 4 subfolders. I did this based on the starting value of the MD5 filename (16 variables split 4 ways). Of course that will not equally balance the workload but it was good enough for testing. I started the 4 jobs on the Hot Folder interface at the same time (one job per subfolder). Although it was still limited to 4 cores, the split did make a difference. All jobs were completed in less than 10 minutes. This made me consider the option of buying two Corporate 4-core licenses running on separate servers instead of using RS. If you wait, ABBYY often sells 4-core at a 40% discount for $359/license. So roughly $700 for unlimited OCR compared to RS pricing. Questions for the community: 1) What do you use for OCR? Has it been a good ROI? 2) What OCR settings do you use? What works best for an eD environment? Thanks for reading, Wm Quote Link to comment Share on other sites More sharing options...
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.