Using Intella to Index Network File Servers

tufelkinder · July 25, 2012

Hi All,

Is anyone using Intella to index corporate file servers? How well does Intella perform at handling a million documents? So far I've had cases up to 3,000,000 documents and performance has been acceptable, but always in a static dataset. Are there any hiccups or strange behavior that one should be prepared for in a dynamic environment like this? Is re-indexing faster than the first index?

Any feedback or experience would be helpful!

Thanks,

Walt

-~

Chris · July 26, 2012

Hello Walt,

Indexing such repositories should work OK. We have had users reporting that they have done up to 9 terabyte of data in a single case with Intella. So indexing the user data on a fileserver is a very valid use for Intella.

As a general rule, loose documents are usually much easier to process than PST, NSF and other mail container formats, where the complexity and size of the mail container itself comes on top of the complexity of the file formats. This extra complexity adds to the processing time and increases chances of file corruption spoiling the processing. In other words: processing loose files is technically easier.

To give you some numbers on how much time a certain amount of data would take on a "regular" machine:

1 GB of PST: 8 minutes (older files tend to be slower)

1 GB of NSF: 12 minutes (larger files tend to be slower)

1 GB of Mbox: 5 minutes (file age and size has no impact)

1 GB of PDF documents with lots of images: 20 Minutes

1 GB of Word documents: 3 minutes

Many other vendors benchmark against processing Mbox as it is the fastest and easiest to process and makes for good marketing numbers, but in reality you will often get different stats.

So if you had a 5 GB PST that has been manipulated by Outlook for several years and has lots of PDF attachments in it, that will have its impact on the processing speed. Likewise, if you had a 5 GB Mbox file with no attachments, it would fly through it.

As reindexing works by first clearing the entire database, indexing and reindexing times are comparable. In a future release we may add the ability to do this incrementally.

For now, a workaround could be to create a copy of those files that have been changed since a certain date and time and add that as a separate Folder source to an existing case that already holds the older documents. That only works when a file's contents and location is static once added, so this will not always be a viable option.

tufelkinder · July 27, 2012

Thank you, this is very helpful information!

Walt

-~

Sign In

Using Intella to Index Network File Servers

Recommended Posts

tufelkinder

Link to comment

Share on other sites

Chris

Link to comment

Share on other sites

tufelkinder

Link to comment

Share on other sites

Browse

Activity