PF1 Posted October 3, 2018 Report Share Posted October 3, 2018 I am wondering if there is a way to control the order of analysis for deduplication. I frequently collect GMail IMAP accounts and find that the ALL MAIL folder generally holds a duplicate of messages located in other GMail folders (well, Gmail tags, really). But, it is entirely possible that a user could place a message into the ALL MAIL folder on his own and it would be the only instance. What I am wondering is if there is a way to have Intella review for duplicates whereby the All MAIL folder (or any folder) is assigned the lowest priority? Given an email that is present in ALL MAIL and also STARRED, I want the ALL MAIL version excluded and the STARRED one to remain. However, if a message only exists in ALL MAIL, I want it included in my end result (so I cannot simply exclude the ALL MAIL folder completely). Quote Link to comment Share on other sites More sharing options...
jon.pearse Posted October 5, 2018 Report Share Posted October 5, 2018 Hi, During indexing we identify the first instance of a file, then when we come across duplicates, these are marked as duplicates. We don't select which item (based on location) should be the primary file, then all other are duplicates. You may have a situation where someone may want Outlook to be the primary file for deduplication, then someone else may want Exchange mail to be the primary file etc. Quote Link to comment Share on other sites More sharing options...
PF1 Posted October 5, 2018 Author Report Share Posted October 5, 2018 I guess in the future I could select each of the individual MBOXs from the IMAP collection except the ALL MAIL MBOX, index the collection, and then add the ALL MAIL MBOX in as a second step. Anything that was a duplicate in ALL MAIL would be duped out. As a workaround, I showed the "duplicates" column in the listing pane, sorted based on location and tagged for export any item in the ALL MAIL location that did not show a duplicate, but did not tag any item that did show a duplicate. All other relevant items from other Gmail 'folders' were tagged and all tagged items were exported. 1 Quote Link to comment Share on other sites More sharing options...
todd.cooper Posted November 28, 2018 Report Share Posted November 28, 2018 On 10/5/2018 at 12:20 AM, jon.pearse said: Hi, During indexing we identify the first instance of a file, then when we come across duplicates, these are marked as duplicates. We don't select which item (based on location) should be the primary file, then all other are duplicates. You may have a situation where someone may want Outlook to be the primary file for deduplication, then someone else may want Exchange mail to be the primary file etc. Jon, is there a way to search or filter the data so that it returns only those files identified as the first instance of a file? Quote Link to comment Share on other sites More sharing options...
jon.pearse Posted November 28, 2018 Report Share Posted November 28, 2018 Hi Todd, You can use the 'Has duplicates' category in the Features facet to show all of the duplicates. Then use the Deduplicate button to show the first instance of all duplicates. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.