Shirish.Lele Posted August 8 Report Posted August 8 I am looking at optimising crawler configuration and memory setting and for that I am hoping to get some insight into how the crawlers work. I recently processed a disk image and was monitoring the crawlers. (I had configured 8 crawlers with 32GB RAM each.) For most of the time, I saw a single crawler processing the .e01 file. Which could mean that the other 7 crawlers and the memory assigned to those crawlers was not being used for most of the time. Not that other crawlers were not used at all, I did see those coming up in between and processing a few files. There must be some logic which determines when a task is assigned to a new crawler. For example, if there are multiple smaller files then each crawler gets assigned files individually and that results in optimal use of the crawlers. However, there could be some different logic as well that says that once a container like say a zip file is assigned to a crawler, all the files inside that archive would be processed by the same crawler. If there is something like that then it would be good to know as I can change my crawler configuration depending on the source that is to be processed. Quote
Jacques B Posted August 9 Report Posted August 9 That's an aweful lot of RAM per crawler. I gather you must have a lot of RAM in the workstation/server? You may have already seen this, but Vound has a helpful knowledge article on explaining crawlers and memory setting (https://support.vound-software.com/help/en-us/8-installation/71-optimizing-intella-memory-settings-for-best-performance). It doesn't quite cover what you are asking however (at least I don't think). But sharing in case it helps. Quote
Shirish.Lele Posted August 9 Author Report Posted August 9 Thanks. I have a system with 88 logical processors and 1 TB RAM. I could imagine that increasing number of crawlers for loose files would improve performance but I am not sure how it would assign crawlers when dealing with containers. Quote
Jacques B Posted August 9 Report Posted August 9 OK, so you DO have an aweful lot of RAM 😀. I thought I had a good setup with our Intella Node system with 2 physical processors (40 logical processors in total) and 128 GB of RAM. Your system must blast through data. My only other thought that comes to mind would be to group content you are ingesting into batches of similar content, and then ingest them in separate instances to tweak the settings based on the type of data you are ingesting in that batch. But I appreciate that it doesn't answer your original question. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.