Searching OCR post case conversion

ShaunC · January 31, 2023

Hi all,

I have a compound case that I have upgraded to 2.6 (I'm using Intella Pro).

I have yet to re-index the sources as it's hundreds of GB and we're actively using the case.

I've performed a keyword search for a phrase, and I get matches in several Word Documents.

There is a PDF document (that was OCR'd prior to case conversion) in which I can see the phrase in the OCR tab of that document, however the PDF is "unresponsive" to the search.

I tried copying the text out of the OCR tab of the PDF and pasted that into the search box in case there's something funny going on with a character being substituted (like a lower case l for a 1 or something) and it still doesn't get returned.

I don't have anything de-selected in the searching options drop-down.

Checking the "Words" tab for the document just shows the words from the metadata of the file (this could well be normal behaviour - I've never looked to see if the OCR words get added to the "words" tab before to be honest.)

Something a bit interesting/weird is that once I search the phrase (with my "unresponsive" PDF already open in a preview window), that phrase gets highlighted in the document.

I then tried other OCR'd PDF files and the same thing happens - they are unresponsive but when previewed the phrase is highlighted anyway.

I'll kick off the re-index overnight and see if that helps.

Can anyone else replicate this out of curiosity?

Cheers!

ShaunC · February 2, 2023

I re-indexed the sources in both source cases (remembering I'm using a Compound case here)

I also re-OCR'd the items in both source cases, however I believe it was set to skip items already OCR'd.

I then opened the compound case and searched the phrase again and it still did not return the PDF I expected

I then re-OCR'd the item itself directly (while in the compound case), and that did then enable the PDF to be returned in the phrase search

Is this intended and should be an extra step in the case conversion steps, or is this a bug and the PDF should have been responsive without having to re-OCR it?

Mateusz · February 2, 2023

Hello ShaunC,

This feature has been updated in the newly released hotfix 2.6.0.2. Please update your Intella and the OCR functionality should work properly after conversion.

We advise that you convert the compound case once again after updating to Intella 2.6.0.2 and it should carry the OCRed items over as well.

ShaunC · February 2, 2023

Lovely - thanks Mateusz

ShaunC · February 2, 2023

I'm getting a 404 error for the EXE link on the downloads page actually

ShaunC · February 2, 2023

And downloading now - cheers

Sign In

Searching OCR post case conversion

Recommended Posts

ShaunC

Link to comment

Share on other sites

ShaunC

Link to comment

Share on other sites

Mateusz

Link to comment

Share on other sites

ShaunC

Link to comment

Share on other sites

ShaunC

Link to comment

Share on other sites

ShaunC

Link to comment

Share on other sites

Join the conversation

Browse

Activity