Jump to content

Preview of Office documents slower since 1.8 ?


Recommended Posts

I've noticed that the native preview tab in connect 1.8 (this includes both .2 & .4) is slower to display the document that it was with 1.7.x

 

Its fine previewing PDF's as these are not converted server side, put word/excel and powerpoint documents when being converted from native to pdf as not as quick as before.

 

The server platform is the same, the case is the same, just connect is now on 1.8 - the users of the case commented that they used to be able to review documents and tag in the preview pane with documents loading is <1sec - now the same process takes about 2-3seconds with the "loading native previewer" message displayed.

 

I've monitored the connect server and I can see the java process utilizing CPU cycles, but these are well balanced acrossed the 8 cores and dont raise the server above 22-25% for about 2-3 seconds. It would be a better user expereince if it used 75% for 1 second and I think this is the profile that we saw in 1.7.

 

Has the preview engine (which converts office documents to PDF)  in connect 1.8 been changed? and can I tune it to speed up the process. 

 

Best regards,

 

Jason 

Link to comment
Share on other sites

Hello Jason,

 

There have indeed been some changes: since version 1.8 Word documents are no longer converted to PDF using MS Office, and since version 1.8.1 MS Office is also no longer used for spreadsheets and presentations. The conversions to PDF still take place but are now handled by components that are build into Intella Connect.

 

This change has a lot of benefits: besides the reduced system dependencies the PDFs are of better quality, e.g. they can now show any change tracking, the mapping from spreadsheets to a paginated format looks better, etc. This change also means that we got rid of a lot of potential support issues, e.g. invoking MS Office from Connect when it is running as a Windows service is more or less impossible.

 

It is true that the new conversion libraries are a bit slower in generating the PDFs than MS Office. We are looking into ways in which we can improve this. At the moment there is no tweaking that you can do, other than making sure you have an adequate machine - also see this post: http://community.vound-software.com/index.php?/topic/267-connect-hardware-requirements/?p=1334

 

One feature enhancement that we are considering and that would take away this problem entirely is by letting the case creator/admins pre-generate the PDFs. This would make the PDFs instantaneously available and also improve exporting speed. What do you think about that?

Link to comment
Share on other sites

Hi Cristiaan,

 

This makes sense now.

 

I've just tested it with 16 CPU cores (rather than the production 8) and it makes very little difference. It appears that the code for the conversion will not use all of the available CPU cycles

 

The pre caching idea would be great.

 

Ideally as a background process based on a certain Flags/Tags when connect is up and running.

 

I say that because you may have 100,000 office documents of which 1500 may be responsive to search terms - it would only be these 1500 which you would pre-cache ready for the linear review, the rest can wait for the conversion but I would not want to add too many hours to the initial index caching the whole 100,000.

 

 

Happy to test this if you have a pre-production?

Link to comment
Share on other sites

Hello Jason,

 

Yes, the number of cores will be of no difference. If you see any significant disk usage, you may consider looking at the amount of RAM (more RAM means more space for disk caching), but again I don't expect that things will improve much for single document conversions, which I suspect are CPU-bound and largely single-threaded.

 

At the moment we are not yet working on this pre-caching, but as soon as we are, we will keep you in mind!

Link to comment
Share on other sites

I like the idea of pre-caching in general and it's something I always put forward as an idea for forensic software when I get a chance.

 

I'd love to see this expanded and included as an indexing option for those that are happy to wait the extra hours and also as an stand alone option that can be done at a later time on select objects.

 

For example, a new tab/screen when setting indexing options with some tickboxes for pre-caching for Word, Excel, PDF, Pictures and then a seperate section under the tools menu for select in case pre-caching of tagged objects.

Link to comment
Share on other sites

Great - sounds like a plan. It really is quite an important 'need'.

 

Intella has historically struggled as a eD review platform against the strong products that were weak in search but strong is linear review where you need to be making a decision on a document in 2-3 seconds. Then with 1.7 connect the lightning speed for linear review made it a winner.

 

I'm keen for it not to slip back because of this slow down.  As forensic search techies we are happy to use the contents pane and review the text based preview - but lawyers are not suited to this, they need the full wiziwig preview of the native document. 

 

Fingers crossed

 

Best regards,

 

Jason 

Link to comment
Share on other sites

  • 9 months later...
  • 1 year later...

Hi Jason,

 

We have added a feature that allows the user to pre-generate PDFs for specific files. This is a background task that you can setup and run before document review.

 

If your review dataset has a lot of Word docs, PDFs etc, you can group all of these files into a tag. You can then run a background task (under Settings) to pre-generate the PDF view for these items. Once complete, the documents will load instantaneously during review process. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...