wmfiske
-
Posts
9 -
Joined
-
Last visited
Posts posted by wmfiske
-
-
I am using 1.9.1 to read a CSV load file and it will not read the file properly when it comes to the Extracted Text field.
A sample Extracted Text field is "Images\001\001\00000001.txt"
During the Validation step, it says it cannot read the file. The example above shows [path]\Images00100100000001.txt
The backslashes in the CSV file are not being read and shows it as one long string.
-
When I use Preview Item (CTRL+O), I can enter either an Item ID number or a URI.
If I want to identify a series of Item ID's, I can add those Item ID's items to a single text file and import that list via the Item ID Lists facet.
Can you expand the Item ID Lists facet to include a text file containing URI's?
-
[disregard - wrong forum]
-
I would like to open a community discussion on OCR settings and programs as I have been doing some performance testing recently.
There are two versions of ABBYY that I have been testing: FineReader Corporate (4 core) and Recognition Server (RS v4).
My first assumption was that RS v4 would be faster since it is 4-5x the cost of the 4-core Corporate version. I was using an unlimited core version and I liked the idea that I could export/import files directly from Intella v1.9.
In one test, I sent 100 non-searchable PDF files to RS using the Intella interface. I preconfigured a workflow in RS to export to Text format. The PDF files were random sizes, 4 had errors (corrupted) and they totaled 1,067 pages.
TEST #1 (Good):
RS server, which was running on a separate server than Intella, completed the task in 26 minutes. (Note: One downside to using the Intella interface to export/import to RS was I could not use Intella while it was processing)
TEST #2 (Better):
Corporate, which was running the Hot Folder function on a separate server, completed the task in less than 19 minutes. The output and other settings was equivalent to the RS workflow.
TEST #3 (Best):
I then wanted to figure out a way to squeeze more performance from Corporate Hot Folder. I created a batch file that split my PDF files into 4 subfolders. I did this based on the starting value of the MD5 filename (16 variables split 4 ways). Of course that will not equally balance the workload but it was good enough for testing.
I started the 4 jobs on the Hot Folder interface at the same time (one job per subfolder). Although it was still limited to 4 cores, the split did make a difference. All jobs were completed in less than 10 minutes.
This made me consider the option of buying two Corporate 4-core licenses running on separate servers instead of using RS. If you wait, ABBYY often sells 4-core at a 40% discount for $359/license. So roughly $700 for unlimited OCR compared to RS pricing.
Questions for the community:
1) What do you use for OCR? Has it been a good ROI?
2) What OCR settings do you use? What works best for an eD environment?
Thanks for reading,
Wm
-
Does Intella calculate the total expanded size of data?
I know that I can export the table to a CSV list with each entry and sum the size. That export takes some time.
-
Igor,
The recovery of deleted items is exactly what took the longest. It took a total of 9 hours to process the entire 30 GB OST file. According to the log, 7 hours was used to process deleted items.
-
The OST file was created by Outlook 2010.
-
I am currently running ver 1.9 and indexing a 30 GB OST file.
Intella was cranking along just fine for the first 80 minutes.
Then it reduced to a minimal processing mode. The Index New Data window shows no new activity.
However, through Resource Monitor, I can see that Java is reading the OST file, but it appears to be extremely slow in contrast.
Is there a way to determine exactly what is going on?
I am running it on a Windows 7 box (i7-5930 3.5 GHz with 64GB RAM) with evidence on one drive and temp on a 512 GB M.2 card.
This is my second test on this OST file. When I first tested this OST file, it ran on Intella for 13 hours (acted the same way as described above). I finally stopped this process and figured there must be errors in the OST file. I then ScanPST multiple times on it to see if that would help improve it.
Limiting Fields for Keyword Lists
in Intella Connect/Node/Investigator
Posted
I have the same question on how this can be done if a reviewer is using Connect.
I can do this type of limitation to the keyword list if I use Intella Viewer or Intella Pro. However, in my case, I have a need for this to happen with the reviewer using Connect.