Jump to content

Crawler management.


Recommended Posts

I am looking at optimising crawler configuration and memory setting and for that I am hoping to get some insight into how the crawlers work.

I recently processed a disk image and was monitoring the crawlers. (I had configured 8 crawlers with 32GB RAM each.) For most of the time, I saw a single crawler processing the .e01 file. Which could mean that the other 7 crawlers and the memory assigned to those crawlers was not being used for most of the time. Not that other crawlers were not used at all, I did see those coming up in between and processing a few files.

There must be some logic which determines when a task is assigned to a new crawler. For example, if there are multiple smaller files then each crawler gets assigned files individually and that results in optimal use of the crawlers. However, there could be some different logic as well that says that once a container like say a zip file is assigned to a crawler, all the files inside that archive would be processed by the same crawler.

If there is something like that then it would be good to know as I can change my crawler configuration depending on the source that is to be processed.

 

Link to comment
Share on other sites

That's an aweful lot of RAM per crawler. I gather you must have a lot of RAM in the workstation/server? You may have already seen this, but Vound has a helpful knowledge article on explaining crawlers and memory setting (https://support.vound-software.com/help/en-us/8-installation/71-optimizing-intella-memory-settings-for-best-performance). It doesn't quite cover what you are asking however (at least I don't think). But sharing in case it helps.

Link to comment
Share on other sites

OK, so you DO have an aweful lot of RAM 😀. I thought I had a good setup with our Intella Node system with 2 physical processors (40 logical processors in total) and 128 GB of RAM. Your system must blast through data.

My only other thought that comes to mind would be to group content you are ingesting into batches of similar content, and then ingest them in separate instances to tweak the settings based on the type of data you are ingesting in that batch. But I appreciate that it doesn't answer your original question.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...