Jump to content

De-duplicating and file family effects


AdamS

Recommended Posts

This is a query that has come to me from a colleague out of country and without access to the same data I'm having issues trying to recreate what they are asking and was hoping someone here might shed some light on it. I have also emailed support so I apologise for doubling up but thought that would be the best way to get a speedy answer ;)

 

The text below is a direct quote from the email I received, further the data they are working with came to them as plain text files from another person who created a load file for Relativity so they don't have direct access to Intella to test/change results.

 

Edit: actually on reading this carefully again I can see that what they are seeing is correct rather that what they were expecting. If it appeared as they expected then there would be no way to link the attachment to it's email. You can probably disregard ;)

 

 

It appears the documents were de-duplicated on a document level vs an email family level. Therefore if 2 emails had the same attachment, the attachment for Email 2 was suppressed.

 
We’ve reviewed all the fields provided, specifically:
 
Intella-Parent ID
Intella-Item ID
Intella-ChildIDs
 
We would assume the Intella-ParentID would be the same for all members of family, for example:
 
Email #1 = ItemID 200 / Parent ID 1000  
Email #1 Attachment #1 = ItemID 201 / ChildIDs 300 /  Parent ID 1000
Email #1 Attachment #1, extracted embedded text = ItemID 201 / ChildIDs 301 /  Parent ID 1000
Email #1 Attachment #2 = ItemID 201 / ChildIDs 302 /Parent ID 1000
 
However we are seeing:
Email #1 = ItemID 200 / Parent ID 1000  
Email #1 Attachment #1 = ItemID 201 / ChildIDs 300 /  Parent ID 200 (matches Email ItemID)
Email #1 Attachment #1, extracted embedded text = ItemID 201 / ChildIDs 301 /  Parent ID 201 (matches Email #1 Attachment #1 Item ID)
Email #1 Attachment #2 = ItemID 201 / ChildIDs 302 / Parent ID 200 (matches Email ItemID)
 
 
We cannot find any UNIQUE field value that ties the whole family together – DOES A FIELD LIKE THIS EXIST IN INTELLA

 

Edited by AdamS
Link to comment
Share on other sites

Just to add some further thoughts as a quick test is not behaving as one would expect.

 

I managed to find 2 emails in a data set which contain the same attachment, a small zip file (hash on both attachments is identical).

 

Email 1 - Item ID 265884 Parent ID 269424 Child ID 498375

Email 2 - Item ID 267109 Parent ID 269424 Child ID 498439

 

So far so good, these are what I would expect to see as both emails came from the same PST archive (which I assumed would be the parent) and the child ID's are different which is also good because although the zip files are identical we still want both emails to retain their respective attachments on extraction, so deduping shouldn't affect that.

 

However if I highlight these emails and 'show parents' the result is not Item ID 269424 as I expected, rather it's showing Email 2 (Item ID 267109) as the parent, doesn't matter what I select, top level or direct only, it's not showing 269424 as the parent item. Item 269424 is the 'inbox' folder from the PST archive which is also not what I was expecting.

 

I'm not sure if this directly relates to the above question from my colleague but thought I'd add it to this post in an effort to better understand how Intella treats the relationships between items.

 

Edit : I located the search preferences which were suppressing the results so now when I show top level parents I see the PST as expected, however what is confusing is the fact that the folder is listed as the Parent ID rather than the PST archive.

Link to comment
Share on other sites

Hi Adam,

 

Note that the "Parent ID" column always shows the ID for the direct parent, and it is not dependent on the Show Parents options.

 

We looking to rename "Parent ID" to "Direct Parent ID" to remove any confusion. We are also looking to rename "Child IDs" to "Direct Child IDs" an well.

 

We may look to introduce new fields for "Top-Level Parent ID" and "Family ID" in a future version.

 

Regards

 

Jon 

Link to comment
Share on other sites

×
×
  • Create New...