Jump to content

jasoncovey

Members
  • Content Count

    31
  • Joined

  • Last visited

  • Days Won

    3

jasoncovey last won the day on March 19

jasoncovey had the most liked content!

Community Reputation

15 Good

About jasoncovey

  • Rank
    Advanced Member

Profile Information

  • Gender
    Male
  • Location
    Atlanta, GA

Recent Profile Visitors

338 profile views
  1. During recent reviews and in light of user feedback, I wanted to propose two coding layout improvements for a future version of Connect. The first has to do with adding a collapsing arrowhead for the top level tags, as they currently exist under the Tags facet. The use case for this has to do with the presence of long list of coding options. Examples might be required references to corresponding Document Request numbers (a common component of production specifications), or complex issues tagging that reviewers apply during review in order to leverage in later stages of a litigation matter when document productions are complete. Allowing these lists to be collapsible would presumably make for an improved user experience, as well as made the coding panel less cluttered when these specific options are not in use, as different reviewers have different objectives at different times, etc. Another possible alternative to accommodate this might be allowing multiple, customizable tabs to be added, with certain coding layout content assigned to certain tabs. I know I have seen this approached used in some other review platforms, and it offered a reasonable way to fit more options into a smaller space. The second has to do with making better use of the available screen real estate for coding layout content. In its current iteration, there is a significant amount of blank space in the right side of the coding pane, which begs for an additional column, to display more options without requiring scrolling (which I have found to annoy users). In my estimation, the text size used in the coding layout is very generous, and larger than what I am used to seeing elsewhere. However, the value in having a second column of options outweighs that, in my mind. Regardless, perhaps having an option whether to force two columns would be another approach, perhaps in conjunction with the separate tabs idea. Regardless, the current iteration of the Review tab UI is the best ever, and we're looking forward to future improvements to make the most painful, expensive phase of the ediscovery process as simple and streamlined as possible for reviewers. Jason
  2. This one is actually easy. Do this: (1) pull up some items from your case; (2) highly some for export and then right-click and select Export > selection... (you'll actually abort this operation, so don't worry about naming or the destination folder not being empty); (3) check the box for "Add to export set"; (4) then select that radio button for "Add to existing set," and then select the export set you want to delete from the drop-down menu; (5) when you select this radio button, the previously grayed-out "Remove" button will be highlighted, which you can then click to delete the export set containing the error. Intella will give you warning prompt to make sure you have selected the correct export set to remove. Once you proceed, Intella will irreversibly delete the export set, so just make sure you have selected the right one to delete. Hope that helps! Jason
  3. I had been thinking a bit about this question and wanted to throw out an alternative approach. Of course, it's correct that Lucene does not directly support proximity searches between phrases. However, as has been previously mentioned in a pinned post, it does allow you to identify the words included in those phrases, as they appear in overall proximity to each other. Thus, your need to search for "Fast Pace" within 20 words of "Slow Turtle" should first be translated to: "fast pace slow turtle"~20 . This search will identify all instances where these 4 words, in any order, appear within a 20 word boundary anywhere in your data set. Then, with this search in place, you can perform an addition search, applied via an Includes filter, to include your two specific phrases: "fast pace" AND "slow turtle" By doing this, you should be left with a very close approximation of the exact search you initially intended, with your results filtered to only show your exact phrase hits, but within the overall proximity boundary previously specified. Hope that helps!
  4. I think that would work! As long as we would still be able to accommodate the scenario you described in Item No. 6 in your list, that sound like it would be a very simple solution. Jason
  5. So I read this several times today to make sure I understand everything that being described, and I think all sounds fantastic. This solves the majority of the problems that have been described in this thread. One additional issue that came to mind when discussing internally and thinking about scenarios we have grappled with previously. In culling data and creating batched review sets, it's fairly common to run into a situation where, as a results of the entire content of a ZIP archive being include, or due to false positive search hits, that a large portion, or even the entirety of a batch can be determined as being non-responsive without a full, document by document review. In these situations, we are frequently asked if there is a way to bulk-code the documents as non-responsive and make the batch as complete. This presents a problem because, although we train users to query for items into the main previewer where the bulk tag can then be applied, the limitation is that this action is not recorded back in the Review UI because: (1) check marks are not added to the now-coded documents; and (2) the batch is not marked complete as a result. What this results in looks like the following screenshots. First, in the All Batches view, despite the appearance of the progress, every single document has, in fact, been tagged (and I know there is difference between tagged and coded) with either the responsive or non-responsive tag, from the same coding palette. Of course, end users can't begin to understand how "completed" work could look like this, and ask all kinds of questions that we can't really answer other than to say, ignore what you are seeing, everything is actually fine despite what you see. That doesn't go over well with lawyers! By the same token, people LOVE the progress and status data, it's the only such data Connect provides us, so it would obviously be ideal if it could be as accurately as possible and avoid ever being misleading. In the next screenshot, which was taken inside of the Wave 03 Email-7 batch, you can see what we are talking about. As it turns out, all of the spreadsheets are non-responsive, and there are literally hundreds of them, which exist across over two dozen batches. Since they can be identified as non-responsive at a glance, without reviewing at all, we can't spend the time coding each one individually. Therefore, we have to query for items and bulk tag in the main previewer for sake of efficiency. Unfortunately, we have found this to be a very common scenario across dozens of litigation matters, and have to have a way to address it. With all that explained, regarding No. 3 in your list, are you using the term "Closed" to mean the batch will be marked as" Completed" via the green button? In other words, this could be used by someone assigned this permission to address item batches that are not technically "completed" under the current definition? Another question is, when you are saying Closed, will the percentage also move to 100, or will it stay where it is, but just display the green Completed button? I know that our preference would definitely be for 100% due to the issues already described. On the subject of batching, generally, the only other thing we're in desperate need of is the ability to batch documents in sort orders OTHER THAN Family Date. This is particularly the case with incoming load file productions, which may or may not contain adequate metadata for Intella to calculate perfect family dates, or when such a production is not IN any particular date order, which definitely happens all the time. This puts us in a position of not being able to batch the document in bates numbered order, which then breaks up document families and creates an extremely difficult situation for us to resolve. Hopefully that was instructive, and I'm looking forward to seeing these features make it into a future version of Connect! Jason
  6. With regard to 64 GB RAM, which I have installed on a high-end Dell rackmount physical workstation with dual Xeon E5-2600 v4 processors, for 32 total cores, I have not been able to realize the performance I had hoped for in a machine dedicated to processing performance. Not that it was bad - far from it! It's just that I was thinking that better use could be made from the RAM and number of processor cores. In reality, despite having 10K RPM internal, enterprise class, SAS rotational drives and a 15K RPM system drive, it seems like the disk IO simply cannot supply enough throughput to make effective use of that degree of CPU and RAM. I wish I had instead opted for SSD drives, which were more expensive at the time than they are now. The only way I have improved performance with this setup was when we filled out the remaining internal drive bays with 12 Gb/sec 10K RPM drives (vs. their 6 Gb predecessors). I think the information that Primoz has provided is directly in line with my own experiences, and generally cautioning that the investment in massive RAM and CPU may not result in the kind of performance increases you might hope for (like I did). That said, if I was in your situation, I would go for the fastest SSDs I could get, probably go with the less expensive processor, and 32 GB RAM, and do some benchmarking vs. your current machines, while monitoring RAM usage. If can need more, you can presumably expand if you can configure in such a way that you have open slots. Hope that helps some with your decision. Good luck! Jason
  7. I think what Todd is likely referring to is a Relativity-centric concept rooted in the so-called search term report (STR), which calculates hits on search terms differently than Intella. I know I have communicated about this issue in the past via a support ticket, and created such a report manually in Intella, which is at least possible with some additional effort involving keyword lists, exclusion of all other items in the list, and recording the results manually. What the STR does is communicate the number of documents identified by a particular search term, and no other search term in the list. It is specifically defined as this: Unique hits - counts the number of documents in the searchable set returned by only that particular term. If more than one term returns a particular document, that document is not counted as a unique hit. Unique hits reflect the total number of documents returned by a particular term and only that particular term. I have been aware of this issue for years, and although I strongly disagree regarding the value of such data as presented in the STR (and have written about extensively to my users), the fact is that, in ediscovery, groupthink is extremely common. The effect is that a kind of "requirement" is created that all practitioners must either use the exact same tools, or that all tools are required to function exactly the same (which I find to be in stark contrast to the forensics world). I actually found myself in a situation where, in attempting to meet and confer with an opposing "expert," that they were literally incapable of interpreting the keyword search results report we had provided because it was NOT in the form of an STR. In fact, they demanded we provide one, and to such an extent that we decided that the most expedient course of action was just to create a new column that provided those numbers (whether they provided any further insight or not). So in responding to Jon's question, I believe the answer is NO. In such a case, within the paradigm of the STR, a document that contains 5 different keywords from the KW list would actually be counted ZERO times. Again, what the STR does is communicate the number of documents identified by a particular search term, and no other search term in the list. I think it's a misleading approach with limited value, and is a way to communicate information outside of software. Further, and perhaps why it actually exists, is that it sidesteps the issue of hit totals in columns that add up to more more documents than the total number identified by all search criteria. In other words, it doesn't address totals for documents that contain more than one keyword. This is in contrast to the reports Intella creates, where I am constantly warning users not to start totaling the columns to arrive at document counts, as real world search results almost inevitably contain huge numbers of hits for multiple terms per document. Instead, I point them to both a total and unique count, which I manually add to the end of an Intella keyword hit report, and advise them that full document families will increase this number if we proceed to a review set based on this criteria. Hopefully that clarified the issue and provided a little more context to the situation! Jason
  8. I don't really have an answer based on what you have described, but here is what I would do: Perform some proximity searches to see what those identify (e.g. "JR 0000"~3) Go to the Words tab for some of the documents at issue and see if the search text appears there, and if so, in what fields Take a close look at the native files and investigate the presence of formulas that might be causing the issue Make sure the items aren't categorized as having some kind of processing error I see that you mentioned Connect. Although it shouldn't be an issue, if possible, I would attempt to duplicate the issue in the Intella desktop software As a last resort, if you're seeing the text but not finding anything, you might want to export their extracted text and see if they contain the text you're after. If so, you could either search those with another tool, or with Intella, and then tag those items in the original database via MD5, etc. If you're still coming up empty after all that, the support team would probably be interested in examining a sample file to investigate further. Hope that helps! Jason
  9. So Intella doesn't have any rules-based features to accommodate exactly what you're asking. However, if I'm understanding you correctly, the simplest solution is to revise the coding palette to make the Privilege parent tag as not required. That way, if the doc is non-responsive (aka Not Relevant), they can apply that tag, then move on. It IS a good idea to make that tag required. Another good practice is to make that tag with radio buttons so that multiple selections are not possible. It never ceases to amaze us how many documents are coded as both responsive and non-responsive at the completion of a review. An example of what I'm talking about is shown below. In my experience, mandatory tags should be used very sparingly, as they quickly frustrate reviewers, which appears to be the case here. Conversely, it's up to reviewers to code documents accurately. Thus, if a document is tagged as non-responsive, no additional effort is warranted, and they move on to the next document. If they happen to see that the document also contains privileged content, even if non-responsive, they could tag it. However, for sake of efficiency, the presumption has to be made that a document, if not explicitly tagged as privileged, is inherently understood to be not privileged. Thus, there is no need to require the addition of a tag that states the obvious. Unsure would be the only exception in this scenario. Again, it's a primary responsibility of the reviewer to code documents accurately, and there is no substitute for their attention to detail. Also, since it sound like this is a review for a legal proceeding, you might want to take the privilege tag a step further and provide two options (assuming that the standard categories for ediscovery would apply here): Privileged - attorney-client, and Privileged - work product. That way, if they are required to create a privilege log down the road, they will have captured their specific assessment of the privilege type at the time it was clear in their mind. The can then be included in a tag group as part of a CSV export of the metadata, and provide a giant head start in the creation of their future privilege log. Hopefully that explanation will be somewhat helpful for you!
  10. This is a particularly significant and ongoing issue for my users, as well. Although the simplified Review UI has been universally well-received by reviewers, and is now our default approach for even single-user document reviews, a lack of flexibility has created several challenges. It's difficult for a reviewer to understand why they can't return to a document batch they previously reviewed and make tagging changes based on new information that has come to light, which is totally normal given the constantly-moving goalpost in litigation and ediscovery (as well as the other contexts mentioned previously). In fact, it's not uncommon for certain documents to be subject to multiple coding changes as new facts and information becomes available during the course of discovery. I have recently provided Jon with some extremely detailed explanations with respect to these issues, which I hope will be helpful in making Connect more flexible for these types of workflows.
  11. Hello! I though that your questions could be best addressed visually, so I took a few screenshots of the load file export dialogues that address the settings you're interested in. With regard to providing TIFF and PDF, you simply need to check the option for "Also include PDF versions of images" in the load file options dialogue. I always specify another location for these files, as the scenario you're describing is not uncommon with productions in litigation matters. This provides maximum flexibility in your scenario. With regard to file naming, although not incorrect at a technical level, the options you are describing aren't really the industry standard with regard to ediscovery, and will likely cause confusion. It's just not what they're expecting to see, so it will likely raise questions, and you don't want to have to provide that type of explanation. Check out what I did in the file naming and numbering dialogue, which includes the syntax for what you're after. Specify your prefix, use the syntax to specify the number of digits, and move forward. Everything happens in that one text field, under the advanced setting. Hope that helps! P.S. Be sure to to create an export set with each production, and save the log in at least CSV format as SOP. Jason Covey
  12. So you are correct that Intella cannot process PDF Portfolios. It can neither extract the individual PDFs that make up the Portfolio, or the native file attachments to the individual PDF-converted emails (if that's the manner in which the PDFs were created). Although there are some workarounds, they are pretty complicated depending on how far you want to take things in order to restore proper functionality. Before you set off on such a journey, not knowing the context of the production, if metadata was to be provided with the production, you would certainly be better off to go back to the producing party and ask them to produce again in a more accessible format. Assuming that's not an option, you'll want to check out these two Adobe Acrobat plug-ins from EverMap: http://evermap.com/AutoPortfolio.asp and http://evermap.com/AutoSplit.asp The former provides the most advanced functionality for working with PDF Portfolios, whereas the latter's is limited, but also includes a number of other features. The main problem you're going to run into involves metadata. If you need to transform the production into a fully functional ESI data set in Intella, it requires the tedious creation of a custom load file. Although I've done it a few times, if you don't have extensive experience with from-scratch load file creation, it wouldn't be realistic to go down that road. Nonetheless, with enough effort, some creative RegEx searching and data manipulation, it IS possible. A middle ground approach might be this: use one of the two aforementioned tools to extract the PDF-converted emails and any native attachments to a folder. Although the file naming options aren't unlimited, you can achieve something that retains the document order/hierarchy with numeric prefixes. Hopefully the producing party was kind enough to create the portfolio in some kind of chronological order, which would then be preserved by this process. With that done, you could then just process the resulting files into Intella as a folder source, where proper sorting will be achieved by file name. Of course, this won't give you accurate family dates or file types or permit permit full functionality of Intella's Tree view or parent-child tracking, all of which would require the load file route. Although in a perfect world, Intella would support every possible file type. However, in this case, I'm really on the fence about whether this is worth the effort give that: (1) it's a very rare production "format"; and (2) it's arguably not a legitimate production format in that it makes essential metadata inaccessible. That being the case, I would rather see the dev team working on what I think are some much higher priority features. Still, in your case, in light of the amount of work that's required to make a PDF Portfolio production of email functional within an ESI platform, as well as the lack of accessible metadata (you basically have to extract it from the body text of each individual "email," you would be in a strong permission to ask the opposing party to re-produce the data in a format that is reasonably accessible. And the larger the Portfolio or email or volume, the stronger that position would be. Hope that helps! Jason Covey
  13. I think the most expedient answer is to decompress the ZIP prior to processing. This takes the ZIP container out of the equation as a parent item, at which point the the behavior you want is easily attainable. It is SOP for me to always decompress ZIP containers prior to processing in order to avoid this type of issue, which also causes Family Dates to be inaccurate as a result. ZIP files are a horrible, horrible thing in the context of document families. Hope that helps! Jason Covey
  14. In the ediscovery world, we are bombarded by both vendors and developers heralding the promise of advanced text analytics capabilities to effectively and intelligently reduce review volumes. First it was called predictive coding, then CAR, then TAR, then CAL, and now it's AI. Although Google and Facebook and Amazon and Apple and Samsung all admit to having major hurdles ahead in perfecting AI, in ediscovery, magical marketing tells us that everyone but me now has it, that it's completely amazing and accurate and that we are Neanderthals if we do not immediately institute and trust it. And all this happened in a matter of months. It totally didn't exist, and now it apparently does, and these relatively tiny developers have mastered it when the tech giants have not. Back in reality, I have routinely achieved with Intella that which I'm told is completely impossible. As Intella has evolved its own capabilities, I have been able to continually evolve my processes and workflows to take full advantage of its new capabilities. As a single user, and with Intella Pro, I have been able to effectively cull data in data sets up to 500 GB into digestible review sets, from which only a far smaller number of documents are actually produced. PSTs, OSTs, NSFs, file share content, DropBox, 365, Gmail, forensic images - literally anything made up of 1s and 0s. These same vendors claim I can not and should not be doing this, it's not possible, not defensible, I need their help, etc. My response is always, in using Intella with messy, real-world data in at least 300 litigation matters, why has there not been a single circumstance where a key document in my firm's possession has ever been produced by an opposing party, that was also in our possession, in Intella, but that we were unaware of? Of course, the answer is that, used to its fullest, with effectively designed, iterative workflows and QC and competent reviewers, Intella is THAT GOOD. In the process, I have made believers out of others, who had no choice but to reverse course and accept that what they had written off as impossible was in fact very possible, when they sat there and watched me do it, firsthand. However, where I see the greatest need for expanded capabilities with Intella is in the area of more advanced text analytics, to further leverage both its existing feature set, and the quality of Connect as a review platform. Over time, I have seen email deduplication become less effective, with the presence of functional/near duplicates plaguing review sets and frustrating reviewers. After all, the ediscovery marketing tells them they should never see a near duplicate document, so what's wrong with Intella? You told us how great it is! The ability to intelligently rank and categorize documents is also badly needed. I realize these are the tallest of orders, but after hanging around as Intella matured from version 1.5.2 to the nearly unrecognizable state of affairs today (and I literally just received an email touting AI for law firms as I'm writing this), I think that some gradual steps toward these types of features is no longer magical thinking. Email threading was a great start, but I need improved near duplicate detection. From there, the ability to identify and rank documents based on similarity of content is needed, but intelligently - simple metadata comparison is no longer adequate with ever-growing data volumes (which Intella can now process with previously unimaginable ease). So that's my highest priority wishlist contribution request for the desktop software, which we see and use as the "administrative" component of Intella, with Connect being its review-optimized counterpart. And with more marketing materials touting the "innovative" time saving of processing and review in a unified platform, I can't help but think to respond, "Oh - you mean like Intella has been from very first release of Connect?" Would love to hear others share their opinions on this subject, as I see very little of this type of thing discussed here. Jason
  15. This sounds like a scenario where the reviewers did not tag document families consistently. What I would do is, with the tagged documents already showing: Display the "attached" field in the table view; Sort by that field in the table view; Select all of the items that have a check mark in the "attached" field; Right-click and select "Show Parents..."; Choose either direct or top-level parents, as appropriate Apply a new tag to the parent items, as well as the previously-tagged attachments Export via whatever method is applicable. Hope that helps! Jason
×
×
  • Create New...