wmfiske

October 7, 2020

I have the same question on how this can be done if a reviewer is using Connect.

I can do this type of limitation to the keyword list if I use Intella Viewer or Intella Pro. However, in my case, I have a need for this to happen with the reviewer using Connect.

June 14, 2016

I am using 1.9.1 to read a CSV load file and it will not read the file properly when it comes to the Extracted Text field.

A sample Extracted Text field is "Images\001\001\00000001.txt"

During the Validation step, it says it cannot read the file. The example above shows [path]\Images00100100000001.txt

The backslashes in the CSV file are not being read and shows it as one long string.

March 12, 2016

When I use Preview Item (CTRL+O), I can enter either an Item ID number or a URI.

If I want to identify a series of Item ID's, I can add those Item ID's items to a single text file and import that list via the Item ID Lists facet.

Can you expand the Item ID Lists facet to include a text file containing URI's?

March 11, 2016

[disregard - wrong forum]

January 28, 2016

I would like to open a community discussion on OCR settings and programs as I have been doing some performance testing recently.

There are two versions of ABBYY that I have been testing: FineReader Corporate (4 core) and Recognition Server (RS v4).

My first assumption was that RS v4 would be faster since it is 4-5x the cost of the 4-core Corporate version. I was using an unlimited core version and I liked the idea that I could export/import files directly from Intella v1.9.

In one test, I sent 100 non-searchable PDF files to RS using the Intella interface. I preconfigured a workflow in RS to export to Text format. The PDF files were random sizes, 4 had errors (corrupted) and they totaled 1,067 pages.

TEST #1 (Good):

RS server, which was running on a separate server than Intella, completed the task in 26 minutes. (Note: One downside to using the Intella interface to export/import to RS was I could not use Intella while it was processing)

TEST #2 (Better):

Corporate, which was running the Hot Folder function on a separate server, completed the task in less than 19 minutes. The output and other settings was equivalent to the RS workflow.

TEST #3 (Best):

I then wanted to figure out a way to squeeze more performance from Corporate Hot Folder. I created a batch file that split my PDF files into 4 subfolders. I did this based on the starting value of the MD5 filename (16 variables split 4 ways). Of course that will not equally balance the workload but it was good enough for testing.

I started the 4 jobs on the Hot Folder interface at the same time (one job per subfolder). Although it was still limited to 4 cores, the split did make a difference. All jobs were completed in less than 10 minutes.

This made me consider the option of buying two Corporate 4-core licenses running on separate servers instead of using RS. If you wait, ABBYY often sells 4-core at a 40% discount for $359/license. So roughly $700 for unlimited OCR compared to RS pricing.

Questions for the community:

1) What do you use for OCR? Has it been a good ROI?

2) What OCR settings do you use? What works best for an eD environment?

Thanks for reading,

Wm

January 12, 2016

Does Intella calculate the total expanded size of data?

I know that I can export the table to a CSV list with each entry and sum the size. That export takes some time.

January 12, 2016

Igor,

The recovery of deleted items is exactly what took the longest. It took a total of 9 hours to process the entire 30 GB OST file. According to the log, 7 hours was used to process deleted items.

January 12, 2016

The OST file was created by Outlook 2010.

January 12, 2016

I am currently running ver 1.9 and indexing a 30 GB OST file.

Intella was cranking along just fine for the first 80 minutes.

Then it reduced to a minimal processing mode. The Index New Data window shows no new activity.

However, through Resource Monitor, I can see that Java is reading the OST file, but it appears to be extremely slow in contrast.

Is there a way to determine exactly what is going on?

I am running it on a Windows 7 box (i7-5930 3.5 GHz with 64GB RAM) with evidence on one drive and temp on a 512 GB M.2 card.

This is my second test on this OST file. When I first tested this OST file, it ran on Intella for 13 hours (acted the same way as described above). I finally stopped this process and figured there must be errors in the OST file. I then ScanPST multiple times on it to see if that would help improve it.

Sign In

wmfiske

Posts

Joined

Last visited

Content Type

Profiles

Forums

Posts posted by wmfiske

Limiting Fields for Keyword Lists

CSV load file error - Extracted Text field

Item ID

Item ID

OCR Configuration

Expansion Size

Index new data

Index new data

Index new data

Browse

Activity