Jump to content

Deduplication Ordering


PF1

Recommended Posts

I am wondering if there is a way to control the order of analysis for deduplication.

 

I frequently collect GMail IMAP accounts and find that the ALL MAIL folder generally holds a duplicate of messages located in other GMail folders (well, Gmail tags, really).  But, it is entirely possible that a user could place a message into the ALL MAIL folder on his own and it would be the only instance.

 

What I am wondering is if there is a way to have Intella review for duplicates whereby the All MAIL folder (or any folder) is assigned the lowest priority?  Given an email that is present in ALL MAIL and also STARRED, I want the ALL MAIL version excluded and the STARRED one to remain.  However, if a message only exists in ALL MAIL, I want it included in my end result (so I cannot simply exclude the ALL MAIL folder completely).

Link to comment
Share on other sites

Hi,

During indexing we identify the first instance of a file, then when we come across duplicates, these are marked as duplicates. We don't select which item (based on location) should be the primary file, then all other are duplicates. 

You may have a situation where someone may want Outlook to be the primary file for deduplication, then someone else may want Exchange mail to be the primary file etc. 

Link to comment
Share on other sites

I guess in the future I could select each of the individual MBOXs from the IMAP collection except the ALL MAIL MBOX, index the collection, and then add the ALL MAIL MBOX in as a second step.  Anything that was a duplicate in ALL MAIL would be duped out.

As a workaround, I showed the "duplicates" column in the listing pane, sorted based on location and tagged for export any item in the ALL MAIL location that did not show a duplicate, but did not tag any item that did show a duplicate.  All other relevant items from other Gmail 'folders' were tagged and all tagged items were exported.

 

  • Thanks 1
Link to comment
Share on other sites

  • 1 month later...
On 10/5/2018 at 12:20 AM, jon.pearse said:

Hi,

During indexing we identify the first instance of a file, then when we come across duplicates, these are marked as duplicates. We don't select which item (based on location) should be the primary file, then all other are duplicates. 

You may have a situation where someone may want Outlook to be the primary file for deduplication, then someone else may want Exchange mail to be the primary file etc. 

Jon, is there a way to search or filter the data so that it returns only those files identified as the first instance of a file?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...