Jump to content

Leaderboard

Popular Content

Showing content with the highest reputation on 03/19/2019 in all areas

  1. I think what Todd is likely referring to is a Relativity-centric concept rooted in the so-called search term report (STR), which calculates hits on search terms differently than Intella. I know I have communicated about this issue in the past via a support ticket, and created such a report manually in Intella, which is at least possible with some additional effort involving keyword lists, exclusion of all other items in the list, and recording the results manually. What the STR does is communicate the number of documents identified by a particular search term, and no other search term in the list. It is specifically defined as this: Unique hits - counts the number of documents in the searchable set returned by only that particular term. If more than one term returns a particular document, that document is not counted as a unique hit. Unique hits reflect the total number of documents returned by a particular term and only that particular term. I have been aware of this issue for years, and although I strongly disagree regarding the value of such data as presented in the STR (and have written about extensively to my users), the fact is that, in ediscovery, groupthink is extremely common. The effect is that a kind of "requirement" is created that all practitioners must either use the exact same tools, or that all tools are required to function exactly the same (which I find to be in stark contrast to the forensics world). I actually found myself in a situation where, in attempting to meet and confer with an opposing "expert," that they were literally incapable of interpreting the keyword search results report we had provided because it was NOT in the form of an STR. In fact, they demanded we provide one, and to such an extent that we decided that the most expedient course of action was just to create a new column that provided those numbers (whether they provided any further insight or not). So in responding to Jon's question, I believe the answer is NO. In such a case, within the paradigm of the STR, a document that contains 5 different keywords from the KW list would actually be counted ZERO times. Again, what the STR does is communicate the number of documents identified by a particular search term, and no other search term in the list. I think it's a misleading approach with limited value, and is a way to communicate information outside of software. Further, and perhaps why it actually exists, is that it sidesteps the issue of hit totals in columns that add up to more more documents than the total number identified by all search criteria. In other words, it doesn't address totals for documents that contain more than one keyword. This is in contrast to the reports Intella creates, where I am constantly warning users not to start totaling the columns to arrive at document counts, as real world search results almost inevitably contain huge numbers of hits for multiple terms per document. Instead, I point them to both a total and unique count, which I manually add to the end of an Intella keyword hit report, and advise them that full document families will increase this number if we proceed to a review set based on this criteria. Hopefully that clarified the issue and provided a little more context to the situation! Jason
    1 point
  2. In the ediscovery world, we are bombarded by both vendors and developers heralding the promise of advanced text analytics capabilities to effectively and intelligently reduce review volumes. First it was called predictive coding, then CAR, then TAR, then CAL, and now it's AI. Although Google and Facebook and Amazon and Apple and Samsung all admit to having major hurdles ahead in perfecting AI, in ediscovery, magical marketing tells us that everyone but me now has it, that it's completely amazing and accurate and that we are Neanderthals if we do not immediately institute and trust it. And all this happened in a matter of months. It totally didn't exist, and now it apparently does, and these relatively tiny developers have mastered it when the tech giants have not. Back in reality, I have routinely achieved with Intella that which I'm told is completely impossible. As Intella has evolved its own capabilities, I have been able to continually evolve my processes and workflows to take full advantage of its new capabilities. As a single user, and with Intella Pro, I have been able to effectively cull data in data sets up to 500 GB into digestible review sets, from which only a far smaller number of documents are actually produced. PSTs, OSTs, NSFs, file share content, DropBox, 365, Gmail, forensic images - literally anything made up of 1s and 0s. These same vendors claim I can not and should not be doing this, it's not possible, not defensible, I need their help, etc. My response is always, in using Intella with messy, real-world data in at least 300 litigation matters, why has there not been a single circumstance where a key document in my firm's possession has ever been produced by an opposing party, that was also in our possession, in Intella, but that we were unaware of? Of course, the answer is that, used to its fullest, with effectively designed, iterative workflows and QC and competent reviewers, Intella is THAT GOOD. In the process, I have made believers out of others, who had no choice but to reverse course and accept that what they had written off as impossible was in fact very possible, when they sat there and watched me do it, firsthand. However, where I see the greatest need for expanded capabilities with Intella is in the area of more advanced text analytics, to further leverage both its existing feature set, and the quality of Connect as a review platform. Over time, I have seen email deduplication become less effective, with the presence of functional/near duplicates plaguing review sets and frustrating reviewers. After all, the ediscovery marketing tells them they should never see a near duplicate document, so what's wrong with Intella? You told us how great it is! The ability to intelligently rank and categorize documents is also badly needed. I realize these are the tallest of orders, but after hanging around as Intella matured from version 1.5.2 to the nearly unrecognizable state of affairs today (and I literally just received an email touting AI for law firms as I'm writing this), I think that some gradual steps toward these types of features is no longer magical thinking. Email threading was a great start, but I need improved near duplicate detection. From there, the ability to identify and rank documents based on similarity of content is needed, but intelligently - simple metadata comparison is no longer adequate with ever-growing data volumes (which Intella can now process with previously unimaginable ease). So that's my highest priority wishlist contribution request for the desktop software, which we see and use as the "administrative" component of Intella, with Connect being its review-optimized counterpart. And with more marketing materials touting the "innovative" time saving of processing and review in a unified platform, I can't help but think to respond, "Oh - you mean like Intella has been from very first release of Connect?" Would love to hear others share their opinions on this subject, as I see very little of this type of thing discussed here. Jason
    1 point
  3. We have recently considered a new deployment scenario for CONNECT. It turned out not to be viable as it would require purchase of many more Microsoft server CALs and other Microsoft licenses at significant cost. Hence I wanted to raise the question what it would take to have the CONNECT server run in Linux instead of Windows (excluding index creation)? As it is a Java application it would seem to be portable (possibly with loss of functionality such as PST creation). Any thoughts?
    1 point
×
×
  • Create New...