Jump to content

wmfiske

Members
  • Posts

    9
  • Joined

  • Last visited

Posts posted by wmfiske

  1. I am using 1.9.1 to read a CSV load file and it will not read the file properly when it comes to the Extracted Text field.

     

    A sample Extracted Text field is "Images\001\001\00000001.txt"

     

    During the Validation step, it says it cannot read the file. The example above shows [path]\Images00100100000001.txt

     

    The backslashes in the CSV file are not being read and shows it as one long string.

     

     

  2. When I use Preview Item (CTRL+O), I can enter either an Item ID number or a URI.

     

    If I want to identify a series of Item ID's, I can add those Item ID's items to a single text file and import that list via the Item ID Lists facet.

     

    Can you expand the Item ID Lists facet to include a text file containing URI's?

     

  3. I would like to open a community discussion on OCR settings and programs as I have been doing some performance testing recently.

     

    There are two versions of ABBYY that I have been testing: FineReader Corporate (4 core) and Recognition Server (RS v4).

     

    My first assumption was that RS v4 would be faster since it is 4-5x the cost of the 4-core Corporate version. I was using an unlimited core version and I liked the idea that I could export/import files directly from Intella v1.9.

     

    In one test, I sent 100 non-searchable PDF files to RS using the Intella interface. I preconfigured a workflow in RS to export to Text format. The PDF files were random sizes, 4 had errors (corrupted) and they totaled 1,067 pages.

     

    TEST #1 (Good):

     

    RS server, which was running on a separate server than Intella, completed the task in 26 minutes. (Note: One downside to using the Intella interface to export/import to RS was I could not use Intella while it was processing)

     

    TEST #2 (Better):

     

    Corporate, which was running the Hot Folder function on a separate server, completed the task in less than 19 minutes. The output and other settings was equivalent to the RS workflow.

     

    TEST #3 (Best):

     

    I then wanted to figure out a way to squeeze more performance from Corporate Hot Folder. I created a batch file that split my PDF files into 4 subfolders. I did this based on the starting value of the MD5 filename (16 variables split 4 ways). Of course that will not equally balance the workload but it was good enough for testing.

     

    I started the 4 jobs on the Hot Folder interface at the same time (one job per subfolder). Although it was still limited to 4 cores, the split did make a difference. All jobs were completed in less than 10 minutes.

     

    This made me consider the option of buying two Corporate 4-core licenses running on separate servers instead of using RS. If you wait, ABBYY often sells 4-core at a 40% discount for $359/license. So roughly $700 for unlimited OCR compared to RS pricing.

     

    Questions for the community:

     

    1) What do you use for OCR? Has it been a good ROI?

     

    2) What OCR settings do you use? What works best for an eD environment?

     

    Thanks for reading,

     

    Wm

     

  4. I am currently running ver 1.9 and indexing a 30 GB OST file.

     

    Intella was cranking along just fine for the first 80 minutes.

     

    Then it reduced to a minimal processing mode. The Index New Data window shows no new activity.

     

    However, through Resource Monitor, I can see that Java is reading the OST file, but it appears to be extremely slow in contrast.

     

    Is there a way to determine exactly what is going on?

     

    I am running it on a Windows 7 box (i7-5930 3.5 GHz with 64GB RAM) with evidence on one drive and temp on a 512 GB M.2 card.

     

    This is my second test on this OST file. When I first tested this OST file, it ran on Intella for 13 hours (acted the same way as described above). I finally stopped this process and figured there must be errors in the OST file. I then ScanPST multiple times on it to see if that would help improve it.

     

     

×
×
  • Create New...