Jump to content

Using URIs to reprocess skipped items


Jung Son

Recommended Posts

Hi there,

I am working on developing a crawler script that can filter out certain file paths and extensions that we don't need, such as dll files and the windows\help folder. Once the data is processed, I can see a nice outcome in CSV format, which shows the files that have been included and the ones that have been skipped.

If some items are skipped during the filtering process and we later decide to process those particular items, is there a way to use a URI or ID to reprocess and include those items? For example, if I want to include two items under the prefetch folder, is there a way to re-index the case and include certain items based on their IDs or URIs, assuming the URIs won't change?

Any help you can provide or sample script would be greatly appreciated. Thanks!

image.thumb.png.dd8eef8672ca50fc7228ce64b24ba691.png
Link to comment
Share on other sites

Hello Jung,

It is definitely possible. I don't have a ready-to-use script at the moment, but the idea is the following:

  • First, you need to parse the so-called "Script Log" produced by Intella. This is a CSV file where you can find all items that were skipped.
  • After parsing the CSV you can collect the IDs or URIs of the items that you want and save them to a separate file. Let's call this file "items-to-include.csv".
  • Now, you can modify your script to add a new condition: if the item ID is from that list, this item is always included. So this check is done before other checks. Use "item.id" and "item.uri" attributes.
  • Then, you can simply re-index the source with the modified script and it should include the skipped items.

It's important to remember that when you re-index an existing case all item IDs and URIs won't change.

Here is a useful link if you need to parse a CSV in Python: https://www.digitalocean.com/community/tutorials/parse-csv-files-in-python

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...