Jump to content

De-dupping New data for review against data already reviewed


markjrouse
 Share

Recommended Posts

Hi,

 

Normally what I find is that I tend to get data in batches. So as an example, I might receive batch 1 on Monday, process it, run keywords and review search hits, and then receive batch 2 on, let's say, Thursday. What happens more often than not is that batch 2 contains duplicates from the batch 1, which have already have been reviewed.

 

At the moment, I've run my multiple keyword list as a search and include on the batch 1 and batch 2 locations, then sort by message hash for emails and MD5 hash for files. I then have to manually remove the duplicates already reviewed by adding a tag to exclude them.

 

Is there an easy way or a method in using tags to automatically deduplicate an email, for example, from batch 2 against batch 1?

 

 

Link to comment
Share on other sites

Hi Mark,

 

What I would do is to select all items/emails that have already been reviewed, export their MD5 and message hashes to csv files and then import those csv's back in using the 'MD5 and Message Hash' facet. These hash lists can then be used as an exclude filter.

Link to comment
Share on other sites

 Share

×
×
  • Create New...