Jump to content
fuzed

Foreign Language PDF files

Recommended Posts

I have a new client question, they have a number of foreign language PDF files, which are in arabic, chinese, hebrew etc., they are wanting to know how these could be made searchable, we've tested a few PDF's using the Intella platform, and word documents are searchable, however the PDF's do not seem to be.

Any thoughts on how this can be achieved, is this a codepage issue, or are we required to do more work?

Share this post


Link to post
Share on other sites

PDF's may contain photocopies or scanned images of documents. In this case, they need to be OCRed in order to be searchable in Intella. Please note that if embedded ABBYY FineReader method is used, all foreign languages must be selected on the configuration page of the OCR wizard.

For PDF documents containing a parseable text, non-English language should not be a problem. Otherwise, we suggest to open a support ticket and attach few examples of problematic PDFs.  

 

Share this post


Link to post
Share on other sites

thanks Alex, I'll ask the client if I can send over a few examples - I've tried with the built-in OCR tool, and selected the correct language but the examples we used didn't appear to OCR correctly.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...