Jump to content

Leaderboard


Popular Content

Showing content with the highest reputation since 08/12/2012 in all areas

  1. 1 point
    Hello, We are working on this topic and we are planning to add this functionality to Intella product in the next release. Thank you for your suggestion about Jaccard similarity, this metric is one of the metrics which we are testing to improve our near-duplicates analyzer.
  2. 1 point
    Hi all, we have a question regarding the possibility to activate a two-factor authentication in Intella Connect (we are using v. 2.0.1). Is it possible to configure Intella in this way? is there anyone able to do this that can explain us how to proceede? Thanks in advance. Regards.
  3. 1 point
    Introduction We receive numerous support tickets from our customers in regards to advice for using Proximity searches. The user manual provides the basic syntax and there is additional information at these Forum posts. http://community.vound-software.com/index.php?/topic/245-proximity-search-using-more-than-two-words/?hl=prox%2A http://community.vound-software.com/index.php?/topic/359-proximity-search-with-a-phrase-search/?hl=proximity In most cases we are provided with examples of the syntax which the customer has used. In some cases the syntax is very complex and, often the syntax is incorrect. Some customers ask us whether the syntax is correct or ask why their proximity search is not working. This is something that we cannot answer on an individual basis. The point of this document is to provide examples to help our customers to get a better understanding of proximity search syntax so that they can create the correct search syntax for the search that they want to perform. Note: Most of this information applies to all versions of Intella which support Proximity searching. There is a known issue with hit highlighting in versions prior to 1.9.1. We recommend that you update to version 1.9.1 if you encounter this issue. What is a proximity search? Proximity searches are search syntax specifically crafted to find items based on words that are within a specified maximum distance from each other in the item’s text. For example, if I wanted to find all items that have the words 'desktop' and 'application' within 10 words of each other then I would use the following: “desktop application”~10 A proximity search differs from a phrase search in that it does not matter whether 'desktop' is before or after the term 'application' in the text. For example, documents containing either of the passages of text below will be respondent to the proximity search above. "You must turn on your desktop computer before you can open an application." "I have copied the shortcut for the application onto the desktop." Using the Correct Proximity Syntax As mention above we receive proximity search syntax from customers. A lot of the time we see that the customer has created search strings such as the examples provided below: (Baxter Jason) ~20 (article) OR (paper) OR (presentation) OR (public) OR (report) "national OR fire OR service"~30 (truck) OR (department) These examples have been sanitized and shortened however, the original search strings contained several lines of OR statements. This makes the search string complex, cumbersome, prone for errors and difficult to troubleshoot. Example 1 If we look at the first example above, we can see immediately that there are several issues which make this syntax incorrect. One issue is that the terms to be searched are not encased in double quotes. Another issue is that the number of words to be within (~20 in this case) is not at the end of the proximity search syntax as there are several OR statements after this number. The user manual shows a basic example of the syntax “desktop application”~10. Note that the structure is to have two (or more) search terms encased in double quotes followed by the number of words that the terms must be within. The proximity string can be made more useful for larger queries by adding more search terms. The additional search terms need to be separated by the OR operator and encased in parentheses. For example, the first example above could be rewritten this way: "(Baxter OR Jason) (article OR paper OR presentation OR public OR report)"~20. Because the user is looking for one of two terms within 20 words of one of several other terms, we have grouped the keywords by placing them in parentheses and separating the terms with the OR operator, e.g: (Baxter OR Jason) and (article OR paper OR presentation OR public OR report). Note: All of the search terms are still encased in double quotes, followed by the number of words that the terms must be within. This syntax will return any items where Baxter or Jason is within 20 words of article, paper, presentation, public or report. Example 2 Again we see that there are issues with the search syntax in example 2. This time double quotes are used however, they do not encase all of the search terms. Also, we see a similar trend to example 1 where there are several search terms within parentheses and separated by the OR operator. We see a lot of samples like this and wonder whether this format of proximity search has come from another tool. The way I read this example is as follows: Find all items that have national, fire, or service within 30 words of truck or department. The syntax can be rewritten this way: "(national OR fire OR service) (truck OR department)"~30. Again we use the parentheses to group the search terms into the two groups and make sure that all terms are encased in double quotes. Limitations Because the double quotes need to encase all of the search terms, you cannot have a search phrase within a proximity search. A search phrase would require double quotes and you can't have nested double quotes within a proximity search. That said, you can use phrases in keyword lists (see below). In the past we have been provided with proximity search strings where the syntax contained over 40 words separated by the OR operator. As mentioned above, this format is not correct. Even if we corrected the syntax, 40 words in a proximity search makes the search string complex, cumbersome, prone for errors and difficult to troubleshoot. We have also received extremely long search syntax where all search terms contained wildcards. Such complex queries with many wildcards are known to have very poor performance, especially for hit highlighting in the Previewer window. Workarounds There are a couple a methods one could use to manage complex proximity searches that contain a large number of search terms separated by the OR operator. One is to break down the search string and two is to use keyword lists. Breaking down the search string A complex search string can be broken down into several shorter proximity search strings. The shorter search strings are then placed into a keyword list. E.g. “Baxter article”~20 “Baxter paper”~20 “Baxter presentation”~20 “Baxter public”~20 “Baxter report”~20 Intella will be able to process the list of shorter proximity searches more efficiently than one large complex search string. With a small amount of Excel work you can create a keyword list that includes all of your shortened proximity searches in a single list Using keyword lists The idea behind using keyword lists is to reduce the number of items that your proximity search needs to search across. Two keyword lists can be created, one list which contains the search terms in the left group of a proximity search and a second list which contains all the other terms in the right group, e.g. Keyword list 1 Keyword list 2 Baxter article Jason paper presentation public report Next, run the two keyword lists and Tag the overlapping cluster. This cluster will contain the items that have search terms from both keyword lists. Set this Tag as an 'Include' search and run the proximity search. This provides faster searching as you are not searching over the entire dataset. However, be aware that hit highlighting can still be slow or hang Intella if the proximity search is complex and contains wildcards. The advantage of using keyword lists is that you can use the following types of searches and operators: Wild cards (article*, paper* etc) Phrases ("national fire", "fire service" etc) Other search operators
  4. 1 point
    My team is performing production import tests. Despite achieving some positive results, we still have some problems: 1. When checking for errors in the "match loadfile fields with Intella" portion, we encounter the following image, suggesting problems with the opt file, even though the preview of the items with redaction appeared to be correct. The loadfile can be proccessed even with these errors. The Map fields options were filled as in the following image. 2. After processing the loadfile, we realize that some items that should have a redacted image in the visualization panel do not have a image to display. All other files had their images preview displayed properly. Are there any particularities regarding the import of items with redaction we are missing? The dat and opt files are available in the attachments. export.dat export.opt
  5. 1 point
    Hello Jacques, The following post covers a bit about what you're asking and should get you started:
  6. 1 point
    Recently we have had a few customers report that they can not download the Geolite2 database within Intella/Connect. It looks like the vendor for the database has changed the way the database can be accessed, and Intella/Connect can no longer download it. If you need to install the GeoLite2 database, you will now need to firstly download the database, and then install it manually. See the steps below. Sign up for a MaxMind account - https://www.maxmind.com/en/geolite2/signup Go to the downloads area - https://www.maxmind.com/en/accounts/current From the 'GeoIP2 / GeoLite2' section, select the 'Download files' link. Download the GeoLite2 City Binary database. Extract the GeoLite2-City.mmdb file into C:\Users\[USER]\AppData\Roaming\Intella\ip-2-geo-db. Note: You may not be able to see this folder as it is hidden by default. To go directly to the Roaming folder, type %appdata% into the Windows search box, then press the Enter key. Once done, navigate to the \Intella\ip-2-geo-db folder and put the GeoLite2-City.mmdb file in there. Open Intella or Connect and verify that the database is installed. Please see the following video on the above process:
  7. 1 point
    Dear All, Important notice: Note that we will be moving to a new support system within the next month. For security reasons you will need to create a new account and password to use on the new support system. More details will be provided in due course.
  8. 1 point
    Hi QasimProtiviti, In general this shouldn't be the case. What I can suggest to you is do the following: Clear searches Add a regular search query that will produce 1500 items (like before) Add a regular search query for 100 items tagged as "Irrelevant" Now look at the clustermap - if it contains only two clusters, then that is fine. If it contains three, then it means some of items which you tagged as "Irrelevant" are not in the scope of your initial search for 1500 items. Example: I searched for term "look" and marked 10 of those items as "Irrelevant". When I do those three steps listed above I see: Now, I marked 2 more items as Irrelevant, but I made sure that they don't contain term "look". This is what I see after I repeat my exercise: This pictures clearly says that I have 12 irrelevant items, but only 10 of them are also responsive to "look". So if I now search for "look" and exclude items tagged as "Irrelevant", I see: This might look misleading at first, because for an untrained eye "260 - 12" should equal 248, but it's should now be clear that those two extra items were never a part of the "look" cluster, therefore they shouldn't be accounted for. Hope that I was on the right track here and it helps you down the road.
  9. 1 point
    Hi, Selective re-indexing is indeed on our roadmap. I see how the change in how items are merged into a case makes sense and how that can be used as a workaround in the interim, so definitely worth looking into!
  10. 1 point
    Hi Bryan, At this point the only 'easy' way to show duplicates of a group of items is to do the work around which you are currently doing. This functionality may be expanded in a future version.
  11. 1 point
    Hi all, Here are some updates regarding the progress of W4. Where are we at with the official release? We are planning to have our first official release of W4 this week. The installer for the release will be available for download to our beta testers in the next few days. Beta testers will be able to test the new features which have been added since the beta version was released last year. What new features have been included since the beta release? There have been a number of new features added since the beta version. The new features can't all fit into one post, so over the next few days we will post some of the new features that have been added to W4. That said, here is a short list of what we have added: Reporting wizard which allows for a lot of flexibility when creating forensic reports Ingest a W4 case into Intella Colorized tags for easier tag identification Special Note function. This is useful for adding additional information to discovered artefacts New type of visualization in the Summary tab Thumbnail view for image files Email headers tab
  12. 1 point
    Hi Kalin, Re APFS support. This is high on our do to list. We are just waiting for the the functionality to become available. Re thumbnails. We are looking to add a reporting wizard to Intella. This should include the mechanics to export images as thumbnails. Having thumbnails for other file types is a good idea, i will make a ticket for that.
  13. 1 point
    Hi Bryan, It's true that the output of CMD when processing tasks could be improved, however there is also another option available. Instead of analyzing the output in the console, you might preffer to open case logs and monitor the progress there. Here is a snippet showing when OCRing is starting, progressing and finishing: [INFO ] 2019-03-28 13:40:07,100 [CrawlThread] Total page count: 101 [INFO ] 2019-03-28 13:40:07,109 [CrawlThread] Started OCRing 101 items. Using: ABBYY FineReader Engine [INFO ] 2019-03-28 13:40:07,109 [CrawlThread] Settings: Profile: Accuracy Export format: Plain text Languages: English Number of workers: 10 Detect page orientation: true Correct inverted images: true Skip OCRed: true [WARN ] 2019-03-28 13:40:07,115 [CrawlThread] Skipped encrypted content item: 1373 [INFO ] 2019-03-28 13:40:07,116 [OcrServiceProcessor1] OCRing item: 1243 [INFO ] 2019-03-28 13:40:07,116 [OcrServiceProcessor2] OCRing item: 1244 ... [INFO ] 2019-03-28 13:40:32,470 [CrawlThread] Collecting OCR crawl results [INFO ] 2019-03-28 13:40:32,619 [CrawlThread] Collected 0 records. [INFO ] 2019-03-28 13:40:32,620 [CrawlThread] Importing OCRed text and extracted entities [INFO ] 2019-03-28 13:40:32,889 [CrawlThread] Imported OCR text into 150 items. [INFO ] 2019-03-28 13:40:32,938 [CrawlThread] Updating OCR database [INFO ] 2019-03-28 13:40:33,182 [CrawlThread] Finished OCR. Total time: 0:26. Items processed: 99 You could of course monitor the entire log, or perhaps use some command line programs to grep their contents live for regular expressions of your choosing. That way you can only get information about OCR process itself. As for the second question about preserving temporary files generated during OCRing. It looks like a risky operation for me and if one is not careful enough, it may produce errors which would be very hard to find. Fortunately, it shouldn't be needed once we extend Intella so that it re-applies OCRed text to duplicated items discovered when new sources are being added. This is already on our radar.
  14. 1 point
    Server side profiles for users are subject of another feature that we have on our roadmap. As for now they will be stored in browser's storage, so it will be tied to the browser that user is using. Good comment though, it may make me bump the priority of those persistent user profiles in the next release.
  15. 1 point
    Hello Bryan, Please try running the installer like this: setup-intella...exe /S It will run the installer in the background and install Intella in the default location. Some windows will still briefly open and close when certain settings are made, but no user interaction is necessary. Note: we have not tested this switch a lot and therefore we do not officially support it. It worked fine on my system though and I am quite confident that it will work on other systems.
  16. 1 point
    Has anyone successfully imported a Slack Enterprise messaging archive into Intella? It is a json format. Thanks for the help.
  17. 1 point
    We raised this requirement before too. It would be critical for Intella use the SLACK API with Legal-Hold privileges to select and pull data from Slack. Slack has become very big. So, count our vote on this too please. For API reference see: https://api.slack.com/
  18. 1 point
    I had been thinking a bit about this question and wanted to throw out an alternative approach. Of course, it's correct that Lucene does not directly support proximity searches between phrases. However, as has been previously mentioned in a pinned post, it does allow you to identify the words included in those phrases, as they appear in overall proximity to each other. Thus, your need to search for "Fast Pace" within 20 words of "Slow Turtle" should first be translated to: "fast pace slow turtle"~20 . This search will identify all instances where these 4 words, in any order, appear within a 20 word boundary anywhere in your data set. Then, with this search in place, you can perform an addition search, applied via an Includes filter, to include your two specific phrases: "fast pace" AND "slow turtle" By doing this, you should be left with a very close approximation of the exact search you initially intended, with your results filtered to only show your exact phrase hits, but within the overall proximity boundary previously specified. Hope that helps!
  19. 1 point
    Hi John, That's strange though because we kept on searching and we found that we were able to use RegEx to search for properties using a different syntax in the search bar. If we surround the RegEx with a leading and a trailing forward slash "/", the RegEx expression also found hits in the properties.
  20. 1 point
    I think what Todd is likely referring to is a Relativity-centric concept rooted in the so-called search term report (STR), which calculates hits on search terms differently than Intella. I know I have communicated about this issue in the past via a support ticket, and created such a report manually in Intella, which is at least possible with some additional effort involving keyword lists, exclusion of all other items in the list, and recording the results manually. What the STR does is communicate the number of documents identified by a particular search term, and no other search term in the list. It is specifically defined as this: Unique hits - counts the number of documents in the searchable set returned by only that particular term. If more than one term returns a particular document, that document is not counted as a unique hit. Unique hits reflect the total number of documents returned by a particular term and only that particular term. I have been aware of this issue for years, and although I strongly disagree regarding the value of such data as presented in the STR (and have written about extensively to my users), the fact is that, in ediscovery, groupthink is extremely common. The effect is that a kind of "requirement" is created that all practitioners must either use the exact same tools, or that all tools are required to function exactly the same (which I find to be in stark contrast to the forensics world). I actually found myself in a situation where, in attempting to meet and confer with an opposing "expert," that they were literally incapable of interpreting the keyword search results report we had provided because it was NOT in the form of an STR. In fact, they demanded we provide one, and to such an extent that we decided that the most expedient course of action was just to create a new column that provided those numbers (whether they provided any further insight or not). So in responding to Jon's question, I believe the answer is NO. In such a case, within the paradigm of the STR, a document that contains 5 different keywords from the KW list would actually be counted ZERO times. Again, what the STR does is communicate the number of documents identified by a particular search term, and no other search term in the list. I think it's a misleading approach with limited value, and is a way to communicate information outside of software. Further, and perhaps why it actually exists, is that it sidesteps the issue of hit totals in columns that add up to more more documents than the total number identified by all search criteria. In other words, it doesn't address totals for documents that contain more than one keyword. This is in contrast to the reports Intella creates, where I am constantly warning users not to start totaling the columns to arrive at document counts, as real world search results almost inevitably contain huge numbers of hits for multiple terms per document. Instead, I point them to both a total and unique count, which I manually add to the end of an Intella keyword hit report, and advise them that full document families will increase this number if we proceed to a review set based on this criteria. Hopefully that clarified the issue and provided a little more context to the situation! Jason
  21. 1 point
    Just quietly I'm excited. Downloaded and started testing on a 120GB disk image, within 1 minute of processing starting I'm able to start triaging and seeing valuable data. I'll withhold any more comments until the indexing process finishes and I can spend a few hours coming up with some constructive testing, but what I've seen in the last 30 minutes or so has me massively impressed. Edit: sorry just one comment, I love the Events view. A good timeline tool has long been something missing and the way this presents the data is exceptional. I'll be watching closely to see how the reporting side of this tool develops, as traditionally this is where it can get tricky. Porting those timelines out into something useful for clients or third parties to use.
  22. 1 point
    Just wanting to revisit a wish I had from 2015 to bring it back to life. The timeline view for intella, currently we can't do anything except export to PNG graphic file. Adding the ability to export to HTML or Excel would be a huge benefit. I'm constantly asked for timeline graphs/presentations from clients and have to resort to looking at other Analytics tools which are not exactly built for simple timelining, although they do an admirable job it seems a pity to waste the perfect timeline already showing in Intella.
  23. 1 point
    I guess in the future I could select each of the individual MBOXs from the IMAP collection except the ALL MAIL MBOX, index the collection, and then add the ALL MAIL MBOX in as a second step. Anything that was a duplicate in ALL MAIL would be duped out. As a workaround, I showed the "duplicates" column in the listing pane, sorted based on location and tagged for export any item in the ALL MAIL location that did not show a duplicate, but did not tag any item that did show a duplicate. All other relevant items from other Gmail 'folders' were tagged and all tagged items were exported.
  24. 1 point
    In the ediscovery world, we are bombarded by both vendors and developers heralding the promise of advanced text analytics capabilities to effectively and intelligently reduce review volumes. First it was called predictive coding, then CAR, then TAR, then CAL, and now it's AI. Although Google and Facebook and Amazon and Apple and Samsung all admit to having major hurdles ahead in perfecting AI, in ediscovery, magical marketing tells us that everyone but me now has it, that it's completely amazing and accurate and that we are Neanderthals if we do not immediately institute and trust it. And all this happened in a matter of months. It totally didn't exist, and now it apparently does, and these relatively tiny developers have mastered it when the tech giants have not. Back in reality, I have routinely achieved with Intella that which I'm told is completely impossible. As Intella has evolved its own capabilities, I have been able to continually evolve my processes and workflows to take full advantage of its new capabilities. As a single user, and with Intella Pro, I have been able to effectively cull data in data sets up to 500 GB into digestible review sets, from which only a far smaller number of documents are actually produced. PSTs, OSTs, NSFs, file share content, DropBox, 365, Gmail, forensic images - literally anything made up of 1s and 0s. These same vendors claim I can not and should not be doing this, it's not possible, not defensible, I need their help, etc. My response is always, in using Intella with messy, real-world data in at least 300 litigation matters, why has there not been a single circumstance where a key document in my firm's possession has ever been produced by an opposing party, that was also in our possession, in Intella, but that we were unaware of? Of course, the answer is that, used to its fullest, with effectively designed, iterative workflows and QC and competent reviewers, Intella is THAT GOOD. In the process, I have made believers out of others, who had no choice but to reverse course and accept that what they had written off as impossible was in fact very possible, when they sat there and watched me do it, firsthand. However, where I see the greatest need for expanded capabilities with Intella is in the area of more advanced text analytics, to further leverage both its existing feature set, and the quality of Connect as a review platform. Over time, I have seen email deduplication become less effective, with the presence of functional/near duplicates plaguing review sets and frustrating reviewers. After all, the ediscovery marketing tells them they should never see a near duplicate document, so what's wrong with Intella? You told us how great it is! The ability to intelligently rank and categorize documents is also badly needed. I realize these are the tallest of orders, but after hanging around as Intella matured from version 1.5.2 to the nearly unrecognizable state of affairs today (and I literally just received an email touting AI for law firms as I'm writing this), I think that some gradual steps toward these types of features is no longer magical thinking. Email threading was a great start, but I need improved near duplicate detection. From there, the ability to identify and rank documents based on similarity of content is needed, but intelligently - simple metadata comparison is no longer adequate with ever-growing data volumes (which Intella can now process with previously unimaginable ease). So that's my highest priority wishlist contribution request for the desktop software, which we see and use as the "administrative" component of Intella, with Connect being its review-optimized counterpart. And with more marketing materials touting the "innovative" time saving of processing and review in a unified platform, I can't help but think to respond, "Oh - you mean like Intella has been from very first release of Connect?" Would love to hear others share their opinions on this subject, as I see very little of this type of thing discussed here. Jason
  25. 1 point
    I'm piggy-backing off gjennings post in the other forum titled Adding New Data Fields. I couldn't find an actual request for this feature in the Wishlist forum, so I'm adding it here just to make it official. This is hands-down the #1 item on my Intella Wishlist. I come across issues in nearly every case that could be resolved much more easily if we could import data into custom columns (rather than tags). Thanks.
  26. 1 point
    Hi Todd, yes we are seeing this in many cases now, typically where documents have been image scanned and therefore the digital metadata needs to be explained. The client just wants to read the docs in chronological order, they are happy to have a team of admin clerks viewing every document and gathering the 'actual date' rather than the data it was scanned but then there is nothing that can be done to get this back into Intella. We've logged a mail with support so fingers crossed for the future.
  27. 1 point
    For the moment that is indeed not possible. Please note that the Table does have a Message ID column. So you can show the Message IDs and sort on them. If you have a large amount of Message IDs to deal with, you can try the following: List all items in the table and add the Message ID column. Export all results as a CSV, using only the Item ID and Message ID columns. Use Excel or some batch script to filter the CSV so that it only contains the rows with a matching Message ID. Remove the Message ID column from the CSV, so leaving only the Item IDs. Import this file in the Item ID facet. This gives you the set of items with a matching Message ID.
  28. 1 point
    I would love to be able to search the message-ID field specifically, and do so via a keyword list. What I am trying to do is find specific messages, but not the messages that reference to them. I would like to be able to have a keyword list that looks like: messageID:<ABC123> messageID:<ABC456> etc... Is there currently a way to search this specific field only?
  29. 1 point
    When processing data from systems and mobile devices one very often finds file-based databases and data structures. Most popular is SQlite, but there exists others as well (Microsoft EDB, and one could probably even consider plist files to fall into this category). The (table-)structure of these files is application-specific, i.e., varies widely. My proposal would be to create a template format that allows for two things: Template-based specification of (SQL) queries. The query results would then be represented as items in Intella (either per line or by SQL 'GROUP') Definition of mappings of query result fields into custom columns (including type specification, e.g., date, GEO-location coordinates, String, Integer etc.) Allowing people to share their templates for the various applications (and versions thereof) that they have created templates / parser for, would enable the building of a library. The advantage would be that otherwise missed information can be added to event time lines and app-specific GEO-location data to be extracted and identified.
  30. 1 point
    Hi rodrigoalmeida, No, there is no 'soft license' available. Intella/Connect can only be used with a USB license dongle.
  31. 1 point
    Introduction Memory management in Intella may seem like a simple task. "All you have to do is just move the slider all the way across to the maximum to use all of the memory that Intella will allow, right?”. Well, that would help in some cases, but memory management in Intella is a bit more complicated than that. This post explains how memory is used for the many components in Intella and why we can't set optimal settings automatically for every hardware and software configuration. UPDATE July 2019 - New information regarding setting memory and crawler settings Note that from version 2.3 (due for release approx end of July), for Intella and Connect we have added more memory controls in the user interface (UI) for both products. This allows the user to modify all of the memory and crawler setting from within the UI. There is no longer a requirement to edit the Intella.l4j.ini file which is located in the Intella/Connect installation directory to set the memory settings. In the text below, this article discusses editing the Intella.l4j.ini file to modify the memory and crawler settings. If you are running version 2.2.x or earlier, then this information is still relevant. We have also added more information below for how to modify the memory and crawler settings when using version 2.3.x and above. Memory used by Intella First we need to understand how memory is used by the different components in Intella and the processes we use Intella for. There are three different processes where memory is automatically assigned by Intella: Case Manager Intella main process Crawlers Case manager memory This Case Manager memory setting controls how much memory is allocated to the Case Manager process. This setting is for the Case Manager only and does not affect processing of data. It is fixed to 256MB by default and usually there is no need to change it, therefore we have not provided any controls in the user interface to adjust this setting. That said, the setting can be changed manually if required. However, the only reason why someone would want to change this setting is for exporting and importing cases. We have seen a case where we needed to increase the Case Manager memory in order to export a case to an ICF file. However, for the most part this setting does not need to be changed. The Case Manager process usually only lives for a few seconds. After you have selected a case and clicked the 'Open' button, the process is terminated. Main process memory The Main process is started by the Case Manager when you open a case and controls everything you can see in Intella, except for indexing and exporting. It is usually the process that requires the most amount of memory and that is why we added the memory slider in the Case Editor window. This allows the user to easily adjust this memory setting. Below is the table which shows the default memory allocation made by Intella based on the amount of RAM that is in a system: Crawlers and exporting processes The memory setting for the Crawler processes is calculated automatically based on the amount of RAM minus the memory used for the main process, and the number of crawlers that will be used. By default Intella calculates the number of crawlers based on the number of CPU cores in the system. However, this number is capped at 4 as assigning more crawlers without other considerations can adversely affect performance. When the amount of memory per crawler is set automatically by Intella, it will be capped at a maximum of 2GB per crawler. Again, this is a setting that usually does not need any changes, but it can be changed manually if required. The job for the Crawler is only to extract and collect information; they don't index the data right away. The indexing takes place later in the post-processing steps which are done in the Main process. Note: The settings for the crawlers also controls these other processes: Exporting to PDF. Exporting to PST. PDF converter used by the Preview tab. Load file import (TIFF to PDF conversion). OCR import (text extraction). Outlook and Notes validation. When do I optimize for better performance? Now that we have a bit of an understanding of how memory and crawlers are used in Intella, we can look at some examples of getting the best performance from the hardware resources you have and for the dataset that you are working on. Note that it is possible through manual memory allocation to assign all of the memory on a system to Intella processes. This can leave the system in an unstable state. We recommend not assigning more that 75% of the total memory for a system to Intella processes. The last section of this document provides an example of assigning memory in Intella. In this example the assigned memory falls within around 50% of the total memory of the system. Because different amounts of memory can be assigned to different processes, you may want to first work out what processes you are performing. For example, there is a difference between indexing and investigating a case. When indexing a data source, quite a bit of the memory needs to be reserved for the crawler processes, especially when a lot of crawlers are being used on high end machines. During the crawling phase (step 1 of indexing) the Main process actually requires very little memory, but for the post-processing phases (step 2-9 of indexing), the Main process requires considerably more. When investigating a case (case is already indexed), almost all available memory should be assigned to the Main process (Note that this is proportionate to the case size. E.g. there is no need to assign 128GB memory to a 40GB case). As I mentioned earlier, memory is automatically assigned to the different processes used by Intella. This is mostly done based on the hardware resources that the system has. We are cautious not to over-assign memory and crawlers as doing so can actually inhibit performance. That said, the user can manually adjust these memory and crawler settings to better suit their hardware specifications and the data which they are indexing. This leads us to the million dollar question, "What are the best memory and crawler settings to use?”. Well, the answer is not that straightforward, and it really depends on the type of data that is being indexed. For example, if you are indexing a single large PST file, you would probably not see much of a difference in performance if you manually increased the memory allocations and crawlers. In this case the default memory settings and number of crawlers will be enough to provide the best performance so memory settings should only be changed to troubleshoot indexing errors. On the other hand, if your dataset contains a lot of loose files or a disk image then increasing the number of crawlers can provide better performance. In addition, increasing the service memory heaps (the memory available to each crawler) can helps to resolve out-of-memory errors that can occur with specific large items. In summary, when indexing, increasing the number of crawlers for datasets that contain a large number of loose files can provide better performance. If some of the loose files are large files then increasing the memory assigned to the crawlers can help with out of memory errors. In addition, when importing OCR text, assigning more crawlers can increase performance. When investigating a case, almost all available memory should be assigned to the Main process. Also, when exporting to formats such as PDF or PST, increasing the crawler memory might help with performance. Optimizing for best performance The next question is, "How do I change these settings?”. Before we get on to that, note that just increasing crawlers and the memory for each crawler without there being a need for it, may actually hurt performance as there will be less memory for Window's file system cache to work with. Changing the Main process memory To change the memory for the Main process, the user can adjust the slider in the Case Editor window for that specific case. The slider will only allow you to use up to half the memory in the system. This is a safeguard against crashing the system by assigning too much memory to the Main process and not leaving enough for Windows to operate. Changing the number of crawlers The amount of crawlers is automatically set based on the number of CPU cores you have on your system. That said, we limit this by default to a maximum of 4 crawlers. The maximum number of crawlers can be increased if you have a high end system with many cores. For example, if you have a system with 12 cores, the maximum amount of crawlers can be set to 6 (2 cores per crawler). For Intella/Connect version 2.3.x and up, this setting can be adjusted the Case Editor window. Click on the Advanced button to show the advanced settings. The Crawler count option can be changed from Auto to Manual, then you can set the desired number of crawlers for the case. For Intella/Connect version 2.2.x or lower, this change needs to be made manually. First close Intella, then manually edit the Intella.l4j.ini file located in the install directory. The line to look for and edit is '# -Dintella.crawlersCount=4'. First remove the leading '#' (this symbol disables the line of code), then change the crawlersCount to 6 and save the file. Note that this only specifies an upper limit; the actual amount of crawlers used by Intella will also depend on the evidence data. Increasing the memory for each crawler Another memory setting is for the amount of memory assigned to each crawler. As mentioned earlier, this is a setting that normally does not need to be changed. However, if you are getting 'out of memory' errors during the crawling/indexing phase, then you may need to assign more memory to the crawlers. For Intella/Connect version 2.3.x and up, this setting can be adjusted the Case Editor window. Click on the Advanced button to show the advanced settings. The Service memory allocation option can be changed from Auto to Manual, then you can set the desired amount of memory for each crawler using the slider. For Intella/Connect version 2.2.x or lower, this change needs to be made manually. First close Intella, then manually edit the Intella.l4j.ini file located in the install directory. The line to look for and edit is '# -Dintella.serviceMaxHeap=800M'. First remove the leading '#' then change the serviceMaxHeap to what you would want each crawler to use (e.g. 2g). Once done, save the file. From this point, Intella will use 2GB of memory for each crawler when indexing and exporting data in this case. Be mindful when allocating memory to the crawlers. You must ensure that there is enough memory for all of the crawlers, the Main process, and the other applications that are running on the computer system. We have had customers complain about performance issues when modifying this setting. In one example, someone increased the crawler memory to over half of the physical memory. This made the number of crawler reduce to one crawler. In turn, indexing a large dataset with one crawler caused the user substantial performance issues (this effectively makes Intella indexing a single thread process). The above mentioned instructions to change the memory and crawlers in the Case Editor window are for Intella Desktop. For Connect, the new memory settings for each case can be seen, and edited in the case list of the Admin dashboard. Simply set the setting to Manual, then enter the memory to be used (in MB), and the number of crawlers to be used (see below). Other considerations Intella can use memory that isn't directly assigned to it. Intella will work much faster if the Window's file system has sufficient free memory available for caching evidence files and case index files. Be careful not to starve the operating system of memory by assigning too much memory to the different Intella processes. Conclusion As you can see, the configuration settings for Intella are not straightforward and are dependent on a number of factors such as the hardware you have and the type of data you are processing. It is near impossible to specify a set of settings that will work optimally for all hardware specifications and data types. That said, we can give an example of a setup for a given scenario. If we have a 12 core machine with 128GB of RAM, and you need to index a 500GB E01 disk image which contains a lot of heavy text documents, we can suggest a memory configuration that would provide the best performance. We know that the data set most likely contains a lot of text so the following settings would better suit this type of dataset: Increase the main process memory from default (15GB) to 30GB. Increase the number of crawlers from 4 to 6. Increase the crawler memory from 2 to 3gb. With these settings, the total memory usage by Intella will be 30GB + 6x3gb = 48GB. If we add 5-10GB for the file system cache, the total memory usage will be 53-58GB out of a total of 128GB. We still have plenty of free memory for the operating system and other programs on the system.
  32. 1 point
    As an update, a customer has provided a solution where the screen res for the entire PC does not need to be changed. You can actually set Intella to not use the high res settings (thanks Chad). The solution is: 1) Right click on the Intella shortcut and select Properties. 2) Click on the Compatibility tab. 3) Check the option to 'Override High DPI scaling behavior' and select the 'System' option from the dropdown (see the screenshot below).
  33. 1 point
    We have recently considered a new deployment scenario for CONNECT. It turned out not to be viable as it would require purchase of many more Microsoft server CALs and other Microsoft licenses at significant cost. Hence I wanted to raise the question what it would take to have the CONNECT server run in Linux instead of Windows (excluding index creation)? As it is a Java application it would seem to be portable (possibly with loss of functionality such as PST creation). Any thoughts?
  34. 1 point
    Can you make one of those fields required? For example: Relevant yes no requires supervisor's attention That could help. If I'm not mistaken it could even help to properly recompute existing batches (the ones which haven't been finished yet) if you then code one shared item inside them. This is just my evaluation by looking at the code, so please do not attempt this unless you have a backup or using a test case.
  35. 1 point
    Thanks for the input from you both. I have known for years of the database structure of the PST/OST files and have always chuckled a bit at the concept of exporting to native/original given the originating file type. I too would like to see the flexibility to export to .eml or .msg in future releases. In the long run I guess it is just what the client asks for (or what we know they need but they haven't asked for in so many words) that counts.
  36. 0 points
    Hi Kalin, LDAP is currently only being used for Authentication, not Authorization. We decided to keep our Authorization configuration on the Connect side, so that the integration with AD/LDAP wouldn't be overly complicated. The level of automation you are seeking is not something that can be achieved in the current version of our software. I would love to hear from other users too if this is something they would like to see being added, though. CLI/CMD support is currently a PRO/Team specific feature. We are planning to add more automation to Connect in next few release cycles, but we are more leaning towards developing some sort of RESTful API. Again, any feedback from the community about this would be appreciated.
  37. 0 points
    Hi, just realized that the "post-processing steps" cause a 20% usage of the GPU. Would a nice GPU result in performance improvement? Which process steps are affected by the GPU? Thx a lot
×
×
  • Create New...