Jump to content

Excluding embedded objects from view


Recommended Posts

I was wondering if there was a way of excluding embedded objects (not attachments), but embedded objects within ppt, word doc etc. files so that they are not included in an export to a load file and tiff'ed.  What I trying to do is export all files (emails and their attachments) as a load file, along with tiff'ing.  But I'm struggling to try and excluded embedded objects within documents, files etc. so that they are not tiff'ed and bloat the load file.

 

If I have the following example:

 

email (a)

       ppt attachment (x)

              object 1

              object 2

              object 3

       xls attachment (p)

              object 1

              object 2

 

Word Doc (t)

 

There must be a way of easily only select emails, and their attachments, or just loose parent files?  So in the above example, I only want (a), (x), (p), and (t) to be shown in the details view. I don't want the objects of (x) or (p)

Link to comment
Share on other sites

One way that I filtered my results is to select everthing in my case by using *. Then in the details pane, right click and select all then select "Show Top Level Parents", which gives me my emails, flag these parents.  Then I right click and select "Show Children" direct children, which creates another bubble.  I click on my second bubble and flag all those items.  Then I go into Features - Flagged items and select this which then hopefully gives me my parent and children items but excludes embedded objects.  Unless there is a better way of doing this?

Link to comment
Share on other sites

Hi Markjrouse, creating load files from search results is not as easy as it sounds.

 

I have seen people make mistakes when selecting the data for load files. One mistake I've seen is when load files are created from direct searches only which, as you know, do not include all family items (In NZ the discovery rules state that all family items need to be included with any relevant items).

 

I have also seen load files where all children items are returned and included. This causes all embedded items to be added as individual items in the load file. This can really blow out the number of unnecessary tiff images that are produced for the load file (1 PDF file can result in100s of tiff files) and also the time to create the load file. The over populated load file will produce a separate line for each item in the review system which is not ideal.   

 

After the searching phase is complete, care must be taken when selecting the documents for the load file so that you don't miss anything and that you don't end up with a whole lot of junk files in the load file. This is how we select these files and exclude the embedded items.

  • Show all of the top parent items from the search results (some items in the search results may already be top level items).
  • Using the top level items and all of the search hit items, we show all of the children and tag them. We do not select 'show direct children' as relevant files could be several levels deep (i.e. a word doc in a zip file that is attached to an email which is in another zip file). This step helps to identify all pure parent items.
  • To locate the pure parent items, run a search on the search results and parent items and also run an exclude search on the children tag. This will display only the pure parent items. Tag these pure parents and dedup as the client does not want to review duplicates.
  • Show all children items for the pure parent items and tag them.

Now we need to 'clean' the child item tag. We want all email attachments and anything that is in the body of an email (i.e. screen shots that have been pasted in an email etc)

 

Show the children tag and sort by the URI field. The first thing you can remove from this tag are all image files that are marked 'PDF:Aperture' in the URI column. We will have the original PDF so we don't need all of these bits and pieces. This will remove a large percentage of embedded items. Do the same for WORD:Aperture, POWERPOINT:Aperture, OPENXML:Aperture etc.

 

Make sure you don't remove the items in ZIP:Aperture or MSG:Aperture. The MSG:Aperture will include all of the linkedin and twitter logos that we don't want however, there could be screen shots in the body of emails that are relevant and we don't want to get rid of these.

 

Clear the search and search again on the children tag. In facets, view by 'type' then 'image'. Now view in thumbnail view and manually remove items that are clearly 'junk' images.

 

This should clean up most of the embedded items.

 

Regards

 

Jon  

Link to comment
Share on other sites

I will ask our developers if we can add columns and Features facet categories for identifying attachments (given to all items whose direct parent is an email, MMS message, ...) and embedded items (given to all items where one of the parents is a document, spreadsheet or presentation).

 

Other categories that are already planned are Recovered items and Orphan items.

 

If you know of any other item categories that would make your life much easier, do let us know!

Link to comment
Share on other sites

  • 2 weeks later...

"Is Attachment" is a badly needed field in Intella, which is very standard with dedicated e-discovery tools.  If you have a mixed data set of, say, a group of PSTs as well as a set of foldered data, there are definitely scenarios where you might want to isolate the content that either IS or IS NOT an email attachment.  And usually the latter. 

 

In my experience, it most often comes up when you are performing exclusion filtering on a search result, and you want to isolate the non-attachment content.  In a perfect world, we would always receive data that is neatly segregated and organized.  It's possible to get there in the scenarios I'm describing with a combination of steps and some tagging, but it would be a welcome addition to have the flexibility to get there more directly.

  • Like 1
Link to comment
Share on other sites

Hi Jason,

 

Thanks for sharing!

 

Would you agree that the procedure of setting the "Is Attachment" marker only marks the actually attached files, or does it also need to be set on items embedded in those attachments? E.g. images in a Word documents, files inside a ZIP file, etc. Note that you can get to these additional items using the Show Children method, so this is only about convenience.

Link to comment
Share on other sites

On the same path but slightly different if I can add this to Jason's request..isolating emails with attachments.

 

Currently the only way is to show all emails and ensure the 'attachments' field is showing, then sort by that field. Then it's a matter of highlighting all the emails with ticks in the 'attachment' field. If you have many thousands of emails it can be time consuming to scroll and scroll until you find the last email with the tick.

 

For my time and effort I think having the ability to show only email with attachments would be extremely useful. An option for 'emails with attachments' under the "Features" facet would seem to be logical.

Link to comment
Share on other sites

  • 1 year later...

Currently the only way is to show all emails and ensure the 'attachments' field is showing, then sort by that field. Then it's a matter of highlighting all the emails with ticks in the 'attachment' field. If you have many thousands of emails it can be time consuming to scroll and scroll until you find the last email with the tick.

 

Adam, since you have this list already sorted by Attachment (I'm assuming Descending sort), can't you simply find the first item with attachment and then:

  • click on it to select it (it's going to be highlighted in blue)
  • grab the scrolling knob of this panel's scrollbar and move it all the way to the bottom
  • press SHIFT on the keyboard and then click on the last item in the table with your mouse

That should effectively select all items from your previous selection (first email with attachment), leaving you with a selection matching all items with attachments. Then you simply right click and add a tag (in case you need this again).

Link to comment
Share on other sites

  • 7 months later...

Hi all,

 

Just an update to the original post by markjrouse re exporting all files (emails and their attachments) as a load file, but excluded embedded objects within documents.

 

In the new versions of Intella (due for release in 1 -2 weeks) we have improved the categorization of embedded items. This allows the user to create a 'Clean' load file much easier and saves them a lot of time spent on manually cleaning the dataset before exporting it to a load file. The embedded items can now be suppressed by using the "Hide Irrelevant" button next to the "Deduplicate" button. This will suppress all embedded images in documents and emails. That said, the TIFFed image of an email or document will still include all of the embedded images so that the email/document will look like the original item.

In addition to using the hide Irrelevant feature, the user should also use the "Attached" category in the Features Facet as a verification mechanism to make sure that all attachments for email items in the dataset are included. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...