Jump to content

jon.pearse

Members
  • Posts

    295
  • Joined

  • Last visited

  • Days Won

    21

Posts posted by jon.pearse

  1. Hi,
     
    It looks like the skype database was not processed with the restrictions in place.
     
    Along with the Skype information from the file types that you already selected, you also need to include the skype main.db database file. This can be done by clicking the 'Add file name' button and typing main.db as the value.

    Skype settings.JPG

  2. I have discussed this with the dev team and they are looking at options where batches can be reopened for QA purposes.

     

    One option is to add another state called "Reopened" which would allow a user (the person doing the check) to change coding to batches once they are completed. Then privileged users could mark it as "Completed" again once done. We are also looking at adding the ability to capture the exact trail of coding decisions. This would be useful for audit purposes.
    • Like 1
  3. Creating an overlay in Intella

    We have received a number of support tickets regarding creating load file overlays with Intella. An overlay is used to add additional data to a load file which has already been ingested into a review platform. For example, the case may be that a load file was created in Intella and provided to the end user. The end user ingests the load file into their review system (such as Relativity or Concordance) and they discover that some metadata fields (which they required) are not in the load file. The solution is to create an overlay so that the missing metadata fields can be overlaid on to the records that have been loaded into the review platform from the original load file.

    Intella does not have an 'export overlay' option as such. That said, you can export a trimmed down version of a load file that can be used for the purpose of overlaying metadata onto existing records. The so called 'overlay' will contain the missing metadata fields for each record (based on the original load file that was provided to the end user) and will be overlaid onto the existing records by using a common reference or identifier such as the DocID.

    The key point mentioned above is that the records are updated by using a common reference such as DocID. If an 'Export set' was created when the original load file was exported from Intella, the Export set can be used as the common reference and creating the overlay is quite simple. This post shows the steps involved. However, if an 'Export set' was not created during the load file export, then this can be many times harder to do. If that is the case, then really the only other unique identifier you could use (if it was included in the original load file) is Intella's ItemID.

    In these situations the review platform may allow flexibility as to which field can be used as the reference when loading an overlay. If this is the case then the ItemID can be used without a great deal of effort. If the review platform only allows the DocID to be used for the reference, then the DocIDs will need to be added to the overlay manually. This could be done by matching the ItemIDs in the original load file with the ItemIDs in the overlay, and copy and paste the load file DocIDs for the matching records in the overlay. This will be a manual process and is subject to human error, so adequate quality control and checks are required. This post is not for this situation, but rather explains how to create an overlay when an Export set has been created.

    Below are the steps to create an overlay. Note that this is an example only, and your overlay may require different fields and data. You should build your overlay to match the requirements.

     

    1) You start by selecting the same items which are in the original load file. As creating a load file, right click on the items and select 'Export highlighted items'.

     

    2) The 'File naming and numbering' settings could be anything. We will discard these setting so they are not important.

     

    3) Under 'Load file options' we can turn everything off (unless you are adding date/time fields). Turn off natives, text and images as these would have been included in the original load file, so they are not needed here.

    image.png

     

    4) Under the 'Load file chooser' options create a field for the Export set that was used for the original load file export. Also add the metadata fields that you want in the overlay. In my example I have added the fields for the Export set, To, From, Subject, Primary date/time and File name. Your requirements will probably be different, but the important thing here is to make sure that the Export set is included as this will be required for matching the records.

    image.png

     

    5) The 'Redacted items' screen will be greyed out as we are not exporting natives, extracted text or images. Run the load file as normal.

     

    6) The load file (or overlay) should be very fast to create as (in this example) we are basically creating a text file and we don't need to render images, extract text or provide natives.

     

    A DAT file will be created with its delimiters for field and column separators. The review platform should be able to accommodate these delimiters when the overlay file is ingested. Below is a sample of what the overlay file should look like (I have opened this in Excel and removed the delimiters). You can see that I have metadata columns for the To, From, Subject, Primary date/time and File name fields.  You can also see that the Export set is included. This field may need to be renamed in the overlay file to match the field used for importing in the review platform.

    image.png

  4. Hi,

     

    No, this feature did not make it into the 2.0.1 release.

     

    Version 2.0.1 is basically a maintenance with very little new features. This feature is on our road map and will be implemented in the next version, or the version after, but at this stage we can't commit to which version that will be.

  5. Hi Adam,

     

    We have a work around for keeping the order for items that you have KW searched for. 

     

    Because there is currently no setting to keep the results of searches in the same order as the items in the KW list, we need to populate a column which will retain this order. We can do this by searching and saving the hits into hierarchical tags.

     

    Placing the search hits (from a KW list) into hierarchical tags automatically can be done by using a CSV KW list and the Auto-tag feature in the KW list Facet. A CSV KW list will allow you to enter search terms in column A, and you can also designate a tag for the hits of those terms in column B. You can use hierarchical tags in the KW list by entering the slash (/) between the top level tag and child tag (see sample below). 

     

    KWList.jpg

     

    In row 1 of the example above, the search term ABC.001.0000011 will be searched and the results will be placed into a tag named 900001, which is in a top level tag named TagOrder. I have used 900001 as the tag names are sorted in alphabetical order, not numeral order. Using numbers such as 1, 2, 3, 4, etc. for tag names will result in the order looking like this 1, 11, 12, 13, 14..... 2, 21, 22, 23 etc.

     

    You can see in the image below that the Export set IDs can retain their order by sorting on the TagOrder column in the table view. 

     

    TagOrder.jpg

     

    I hope this helps.

     

    Regards

     

    Jon

  6. Hi westerndigital,

     

    We have improved the classification of attachments and embedded items in version 2.0. When exporting to PDF, you should be able to get the email and attachments without embedded signature blocks etc as extra items in the export. 

     

    Note that this will only work with version 2.0 or later. Also, the case has to be indexed in version 2.0 or later for the embedded items to be classified properly.

     

    If you have indexed in version 2.0 or later and you still have this issue, then it may be one of the PDF rendering settings that is causing the embedded items to be included in the export. Try using the settings shown below. I have tested these settings and it works fine.

     

    Regards

     

    Jon

     

    pdf_rendering_options.JPG

  7. Hi Hans,

     

    The Tasks feature (the last screen before indexing) should allow you to do what you are asking. Tasks and Saved searches can be imported into a case and these can be used to filter and export these file types you require.

     

    You can use a Saved search to filter by PDF type and empty documents. You can also use a Saved search to filter by Tiff type, which are not embedded items. This should give you the items for OCR processing. The results can be automatically tagged and exported with the Tasks feature.

     

    Regards

     

    Jon 

  8. As Adam mentioned, assigning custom reference numbers/ID's to documents can be done in Intella. That said, it requires a bit of work and there are some limitations which I will explain in this post.

     

    The objective is to assign custom and unique numbers or IDs to the items in your case. This is useful for referring to items in a dataset when the dataset is being reviewed. Also, a specific document could find its way into a Courtroom, and having a unique reference number or ID for that document from the beginning (ingestion) to the end (production) helps to keep document referencing tidy and reduces the risk of confusion.

     

     

    A few things to consider

     

    The first thing to consider is whether all of the items in the dataset (including embedded items, as Intella extracts these when indexing) should be assigned unique IDs. This is certainly the easiest approach however, some people may want to assign unique IDs to a smaller subset, e.g. top level items and attachments etc. The choice is yours and the main difference will be that the full dataset would likely take longer to process.

     

    Another consideration you may want to think about is whether you want to also keep track of the Parent/Child relationships for these documents. My thinking is that in most cases, this would be desired. This post shows how to include the family references.

     

     

    Assigning custom reference numbers/ID's to documents

     

    Currently Intella does not have a dedicated way to assign unique IDs to items that are already indexed in Intella. One way to do this is to create an export of the items, and to add these items to an Export set. The unique IDs for the items can be configured in a 'load file' export, such as Relativity. A Relativity load file can be produced without the native documents, extracted text or images. Basically, the load file would contain just the DAT file only. This should be very fast to create compared to a full blown load file. When the items are exported in this way, the unique IDs for these items will be loaded into the Export set, and the Export set will be available in the Details table once it has been turned on.

     

    So that gets the unique IDs for our items into the details table, but what about the ParentIDs and ChildIDs to reference the family? Well, this requires a little more work. In fact, when we are creating the export load file, we need to add some additional fields in the 'Load file field chooser' window. This is required because we need to ingest the load file back into the case (as an overlay) to generate the ParentIDs and ChildIDs.

     

    Create a load file export with the following fields:

    • ParentID - This is the DocID for the parent item. The field type is DIRECT_PARENT_ID. 
    • ChildIDs or AttachIDs - These are the DocIDs for the child items or attachments. The field type is DIRECT_CHILDREN_IDS.
    • IntellaID - This is the Intella ID for the item and is required for item mapping when we ingest the load file back into the case. Use Intella's Item ID field for this setting.

    01-ExportFields.JPG

     

    Once the load file is complete, you will need to add the new 'Export set' to the Details table by selecting it in the column chooser.

     

     

    Assigning ParentIDs and ChildIDs based on the Export set

     

    Now that the items in the case have unique number/IDs assigned to them, we need to ingest the load file to assign the ParentID and ChildID fields. This can be done by importing a load file overlay.

     

    1) Click on File > Import Load File in the Intella menu.

     

    2) Select Overlay from the 'Import operation' field, and path to the DAT file of the load file just created.

     

    3) The 'Configure Delimiters' window should show you a preview of the load file. If you can not see the preview, then click on the 'Detect encoding' button. 

     

     

    02-Import.JPG

     

     

    4) The Map Fields window allows you to map load file fields to Intella columns. Unfortunately this only applies to ingesting a load file (or new data) and does not apply ingesting an overlay. That said, you can create Tag Columns in this window and you can map the load file fields to the new Tag Columns. Use the 'Add tag column' button to create new Tag Columns and assign them to the load file columns. Note that the IntellaID field is the field used to overlay the data to the correct items in the case. Make sure that the 'IntellaID' field is selected for the 'Identifier field', and also that 'Item ID' is selected for the 'Identifier type'.

     

     

    03-ImportMapFields1.JPG

     

     

    5) When all of the fields are mapped, click on the 'Check for errors' button. If there are no errors, import the load file.

     

    04-ImportMapFields2.JPG

     

     

    Once the load file has been ingested, you can turn on the ParentID and AttachID fields in the column chooser.

     

    05-ImportComplete.JPG

     

     

    Limitations

     

    The only limitations that I can see with this approach is that it take some time to complete being a manual process, and the Tags facet ends up with a lot of tags under the ParentID and AttachID tags. This may not be a big issue as the top level tag (see AttachIDs below) can be contracted to hide the large number of tags.

     

    06-TagsFacet.JPG

     

    In a future version we will have the ability to create custom columns. This will allow more flexibility in various aspects of Intella.

  9. Hi all,

     

    We have been asked several questions on the support portal in regards to which fields a user should use in a load file. We can't really tell you which specific fields to use for your load file as we are not involved with the investigation/litigation. However, what we will do in this post is provide you with more information in regards to which fields are available when creating a load file and how they can be used.

     

    The fields you should use in a load file will be determined by your client or the firm who you are creating the load file for. Before creating a load file, you should have already discussed the requirements with the receiver and agreed on a specification for the load file (see the 'Creating a Load File' video in the previous post). We recommend using the check lists in the user manual (the Load File Checklist section) for discussing and agreeing on a load file specification with your client.

    Note: These check lists should be used as guides. Some load file requirements may not be shown on these check lists. It is the load file creator's responsibility to ensure that all aspects of the agreed specification for the load file is met.

     

    This post is more about the fields which are available and how they may be used. In Section 25.2.11 of the user manual you will see a number of load file options. What we are discussing here is the list of fields shown toward the end of this section in the user manual.

     

    Intella Metadata Fields

    The most commonly used field from this list is INTELLA_COLUMN. This allows the user to use any of the Intella metadata fields that are shown in the column chooser window for the Details table.

     

    Although the Intella metadata fields are mostly self explanatory and documented in the user manual, there are some fields which require some additional explanation.

    • Any date field can be exported in three different formats:

    Date Only

    Time Only

    Date and Time

     

    01_date_select.jpg

     

    Some load file specifications require that the Date and Time fields are separate. This can be achieved by creating two fields. For example, a sent email would have one Sent field with the 'Date only' setting and the another Sent field would have the 'Time only' setting. Note that it is important to set the date and time format correctly in the Load file options window prior to creating these fields.

     

    • The Intella metadata field for 'Page Count' can produce unexpected results. Note that this data comes from the document's metadata, so if the document doesn't have this property, Intella will produce '-1' and show this in the field. Also, this property only works for MS Office and PDF documents.

    A better alternative is to use the NUMBER_OF_PAGES field as it is not dependant on the document's metadata. This property will also work with any type of document, as it is determined during image production. Note that with this setting, if additional metadata (comments, attachments etc.) is selected during the export process, the number of pages for the exported document may increase.

     

     

    Non-Intella Fields

    Let's have a look at some of the non-Intella metadata fields. We have provided more detail about these fields and what they do.

     

    Document Numbering

    • RECORD_ID_START: This displays the name/number of the first page of the document. It is the main ID for the document and is commonly also referred to as BegBates or DocumentID or the Identifier or the Control number.
    • RECORD_ID_END: This is the name/number of the last page of the document. For example, if a 5 page document started with ABC0000001, the RECORD_ID_END value would be ABC0000005.
    • RECORD_ID_GROUP_BEGIN: This is the RECORD_ID_START value of the first page of the first document in the current "parent-child" group. It would be the first page of the most top level item in that group. This field may be required for email threading and other processes used in Relativity and is sometimes referred to as the Group ID.
    • RECORD_ID_GROUP_END: This is the  document ID of the last page of the last document in the current "parent-child" group.
    • DIRECT_PARENT: This is the RECORD_ID_START value of the document's direct parent. This is commonly referred to as the Parent ID for a document. Note that there is a similar field named RECORD_ID_PARENT. This shows the RECORD_ID_START value for the top level document which is different to the RECORD_ID_START value for the direct parent.
    • DIRECT_CHILDREN_IDS: This shows a list of RECORD_ID_START values for the document's direct children. Note that there is a similar field named ATTACH_ID_LIST. This show all of the RECORD_ID_START values for all child items of a particular item, not just the direct child items.

     

    Note:

    When showing Parent and Child IDs, you may be required to do this on a single level basis. For example, you may need to show the Parent ID of a specific item, one level up. And, you may also need to show the Child IDs for a specific item only one level down. In this case the   DIRECT_PARENT and DIRECT_CHILDREN_IDS fields should be used. An example of this is shown below where Document ID = RECORD_ID_START, Parent Document ID = DIRECT_PARENT and AttachIDs = DIRECT_CHILDREN_IDS.

     

    02_table_view.jpg

     

    • BEG_ATTACH:  This shows the RECORD_ID_START value for the first page of the first attachment document in the current "parent-child" group. This will be empty if there are no attachments in the current group. This is used for emails only.
    • END_ATTACH: This shows the document ID of the last page of the last attachment document in the current "parent-child" group. This will be empty if there are no attachments in the current group. This is used for emails only. 

     

    Native, Text and Image Locations

    • FILE_NATIVE: This field shows the relative path (from the load file DAT and OPT files) for the original native documents. Usually a folder named Native (which holds the native files) is placed in the folder where the load file DAT and OPT files are located (e.g. Natives\01PREFIX00000001.doc). This allows the Natives folder to be accessed for original documents when the load file is imported.
    • FILE_TEXT: Similar to FILE_NATIVE, this field shows the relative path (from the load file DAT and OPT files) for the text files that contain extracted and OCRed text.
    • FILE_IMAGE: Again, similar to FILE_NATIVE, this field shows the relative path (from the load file DAT and OPT files) for the rendered images or PDF files.

     

    Note:

    Although the paths for text files and image files can be different, some people request that these files are both exported into a single folder. For example, in case where this is required, the paths for text and image files may look similar to:

    o   Images\01PREFIX00000001.txt

    o   Images\01PREFIX00000001.tif

    This is entirely up to the person requesting the load file. Intella provides flexibility in this area to accommodate any formation.

  10. Hi,

     

    It seems that it is common to receive load files where the provider has not adhered to the specification that was agreed. We have other customers in the same boat.

     

    I asked one of our customers your question and they said that they use the following:

    • ReadySuite.  It’s just invaluable, because it breaks them out in to Excel-like columns so you can hide the delimiters, like the Intella load file previewer does.  It has a steep learning curve and is pricey, but has a 2-week free trial.  You can also copy and paste data back and forth with Excel, but have to add new fields to RS as some are read-only.
    • Notepad++
    • UltraEdit, which is really powerful and kind of overwhelming. 
    • TextPad is OK, too. 

    It boils down to the type of editing you have to do, and whether it can be managed accurately without having to see the metadata snapped into perfect columns.

  11. Hi,

     

    The Main Process memory can be adjusted using the slider in the Case Editor window for that specific case. The slider will only allow you to use up to half the memory in the system. This is because the system also requires memory for crawlers, the OS and other non Intella related applications and services running on the system.

     

    The number of crawlers can be increased by editing the 'Intella.l4j.ini' file which is located in the installation folder for Intella. 

    • Close Intella.
    • Open the 'Intella.l4j.ini' file and find the line  '# -Dintella.crawlersCount=4'
    • Remove the leading # as this comments the line, meaning that the line will be ignored if it is still present. Change the value at the end to the number of crawlers that you want to use (don't set this higher than the number of physical cores).
    • Save the 'Intella.l4j.ini' file.

    The amount of memory to be assigned to each crawler can be changed in the 'Intella.l4j.ini' file as well.

    • Close Intella.
    • Open the 'Intella.l4j.ini' file and find the line '# -Dintella.serviceMaxHeap=800M'
    • Remove the leading # then change the number at the end to the amount of memory that you want to assign to each crawler. E.g to set 2GB of memory to each crawler use this '-Dintella.serviceMaxHeap=2G'
    • Save the 'Intella.l4j.ini' file.

    Note that it is best to run some tests after making memory and crawler changes to see which configuration works the most efficiently with your particular dataset. 

  12. Hi all,

     

    Just an update to the original post by markjrouse re exporting all files (emails and their attachments) as a load file, but excluded embedded objects within documents.

     

    In the new versions of Intella (due for release in 1 -2 weeks) we have improved the categorization of embedded items. This allows the user to create a 'Clean' load file much easier and saves them a lot of time spent on manually cleaning the dataset before exporting it to a load file. The embedded items can now be suppressed by using the "Hide Irrelevant" button next to the "Deduplicate" button. This will suppress all embedded images in documents and emails. That said, the TIFFed image of an email or document will still include all of the embedded images so that the email/document will look like the original item.

    In addition to using the hide Irrelevant feature, the user should also use the "Attached" category in the Features Facet as a verification mechanism to make sure that all attachments for email items in the dataset are included. 

×
×
  • Create New...