Jump to content

Recommended Posts

Hello!

My apologies if this has already been address, but I could not find it through search.  I am dealing with MST Exchange emails.  The emails contain a mix of standard SMTP email address as well as Exchange X.400-style addresses.  De-duplication becomes a big problem here.  Emails that are otherwise identical have different message hashes when one email has the SMTP address and another email has an X-400-style address.  Is there any way currently to de-duplicate these?

I know that as of the latest version of Intella, you can configure Message Hash to ignore certain attributes (including headers and recipients).  This should work, but I'd really like to have more fine-tuned control than this.  Ideally, it would be amazing if Intella could intelligently recognize that two emails are identical even if they use a mix of SMTP and X.400-style addresses.  From my experience, this issue is very common in dealing with Exchange exports.

Any thoughts would be greatly appreciated.

Thank you!

Bryan

Link to comment
Share on other sites

As an update, here is the technique I used to deal with this.  I don't think this is quite ideal, but it seems to have worked reasonably well in this case:

For all Top-Level Parent emails, I exported the following fields to a CSV file:

  • DocID
  • Subject
  • Sent
  • Attachments
  • From
  • To
  • CC
  • BCC
  • Conversation Index

Using Excel, I identified all emails that contained the exact same values for ALL of the fields in red above.  Using a spot check, I confirmed that the resulting documents indeed appeared to all be duplicates.

Note this technique does not actually compare the email bodies.  A better technique would certainly consider the bodies as well.

Bryan

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...