Bryan La Rock Posted July 9, 2019 Report Share Posted July 9, 2019 Hello! My apologies if this has already been address, but I could not find it through search. I am dealing with MST Exchange emails. The emails contain a mix of standard SMTP email address as well as Exchange X.400-style addresses. De-duplication becomes a big problem here. Emails that are otherwise identical have different message hashes when one email has the SMTP address and another email has an X-400-style address. Is there any way currently to de-duplicate these? I know that as of the latest version of Intella, you can configure Message Hash to ignore certain attributes (including headers and recipients). This should work, but I'd really like to have more fine-tuned control than this. Ideally, it would be amazing if Intella could intelligently recognize that two emails are identical even if they use a mix of SMTP and X.400-style addresses. From my experience, this issue is very common in dealing with Exchange exports. Any thoughts would be greatly appreciated. Thank you! Bryan Quote Link to comment Share on other sites More sharing options...
Bryan La Rock Posted July 11, 2019 Author Report Share Posted July 11, 2019 As an update, here is the technique I used to deal with this. I don't think this is quite ideal, but it seems to have worked reasonably well in this case: For all Top-Level Parent emails, I exported the following fields to a CSV file: DocID Subject Sent Attachments From To CC BCC Conversation Index Using Excel, I identified all emails that contained the exact same values for ALL of the fields in red above. Using a spot check, I confirmed that the resulting documents indeed appeared to all be duplicates. Note this technique does not actually compare the email bodies. A better technique would certainly consider the bodies as well. Bryan Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.