Deduplication Ordering

PF1 · October 3, 2018

I am wondering if there is a way to control the order of analysis for deduplication.

I frequently collect GMail IMAP accounts and find that the ALL MAIL folder generally holds a duplicate of messages located in other GMail folders (well, Gmail tags, really). But, it is entirely possible that a user could place a message into the ALL MAIL folder on his own and it would be the only instance.

What I am wondering is if there is a way to have Intella review for duplicates whereby the All MAIL folder (or any folder) is assigned the lowest priority? Given an email that is present in ALL MAIL and also STARRED, I want the ALL MAIL version excluded and the STARRED one to remain. However, if a message only exists in ALL MAIL, I want it included in my end result (so I cannot simply exclude the ALL MAIL folder completely).

jon.pearse · October 5, 2018

Hi,

During indexing we identify the first instance of a file, then when we come across duplicates, these are marked as duplicates. We don't select which item (based on location) should be the primary file, then all other are duplicates.

You may have a situation where someone may want Outlook to be the primary file for deduplication, then someone else may want Exchange mail to be the primary file etc.

PF1 · October 5, 2018

I guess in the future I could select each of the individual MBOXs from the IMAP collection except the ALL MAIL MBOX, index the collection, and then add the ALL MAIL MBOX in as a second step. Anything that was a duplicate in ALL MAIL would be duped out.

As a workaround, I showed the "duplicates" column in the listing pane, sorted based on location and tagged for export any item in the ALL MAIL location that did not show a duplicate, but did not tag any item that did show a duplicate. All other relevant items from other Gmail 'folders' were tagged and all tagged items were exported.

todd.cooper · November 28, 2018

On 10/5/2018 at 12:20 AM, jon.pearse said:

Hi,

During indexing we identify the first instance of a file, then when we come across duplicates, these are marked as duplicates. We don't select which item (based on location) should be the primary file, then all other are duplicates.

You may have a situation where someone may want Outlook to be the primary file for deduplication, then someone else may want Exchange mail to be the primary file etc.

Jon, is there a way to search or filter the data so that it returns only those files identified as the first instance of a file?

jon.pearse · November 28, 2018

Hi Todd,

You can use the 'Has duplicates' category in the Features facet to show all of the duplicates. Then use the Deduplicate button to show the first instance of all duplicates.

Sign In

Deduplication Ordering

Recommended Posts

PF1

Link to comment

Share on other sites

jon.pearse

Link to comment

Share on other sites

PF1

Link to comment

Share on other sites

todd.cooper

Link to comment

Share on other sites

jon.pearse

Link to comment

Share on other sites

Join the conversation

Browse

Activity