Processing Number v Statistics

I'm having some issues reconciling figures reported during the processing, and those I see in statistics, and was hoping someone could explain the differences.


During processing, the processing status screen, when finished all 11 steps, tells me that there are 1,517,290 items in total, and that there are 1,081,355 duplicate items.  So 1,517,290 minus 1,081,355 should leave me a unquie deduplicated population of 435,935.  In fact on the processing status screen, Intella tells me that unique items are 435,935.  However, when I go into the Statistics screen, yes, I'm told that there are 1,517,290 All items, but after deduplication there are 443,206. So there appears to be a difference of 7,271 as to what the population is after deduplication.  Does the processing status screen calculate unique items differently from the statistics screen?  So if after deduplication the population is 443,206, then the duplicate items count is: 1,074,084, and not 1,081,355.


Similiarly, with the reported exception items on the processing status screen I get 115,408. I've naturaly assumed that this is after deduplication.  In the statistics screen Exceptions Items after Deduplication is 123,404.  A difference of 7,996. 


Should there be a difference?

Hi Mark,


This one is hard to provide a general answer too, it really depends on the audience and use case. Are you or your user interested in the most effective deduplication to assess the time needed for review, or perhaps in a deduplicated count that can be verified by other tools that only use MD5 hashing, etc.


I am making a note that we address this in a future user manual and user interface. I see how it is not clear from the user interface that there is a difference in deduplication methods here.

I have a similar question - why do the processing numbers vary so much between versions of Intella? I recently had a case that I had indexed in 1.6.2 with ~7mil entries, then when I made it a new case in 1.8.1, the number dropped to around half that. Do I need to be wary about the difference in numbers between versions, or are the entries being drastically more de-duplicated?

Hi dpmills,


It's hard to say what is causing this at first sight. 


In order to find a reason why number of discovered items dropped to around half a further investigation will be required.


Could you please open a support ticket - that way you will be able to provide us with all neccesary information.

