matt Posted September 5 Report Posted September 5 import hashlib from api.scripting import ScriptService from api.scripting.ScriptService import (Action, CustomColumn, CustomColumnType, CustomColumnValue, FoundItemResult, ProcessedItemResult) # Hashing functions def sha256(file): hash_sha256 = hashlib.sha256() with open(file, "rb") as f: for chunk in iter(lambda: f.read(4096), b""): hash_sha256.update(chunk) return hash_sha256.hexdigest() def sha1(file): hash_sha1 = hashlib.sha1() with open(file, "rb") as f: for chunk in iter(lambda: f.read(4096), b""): hash_sha1.update(chunk) return hash_sha1.hexdigest() def sha512(file): hash_sha512 = hashlib.sha512() with open(file, "rb") as f: for chunk in iter(lambda: f.read(4096), b""): hash_sha512.update(chunk) return hash_sha512.hexdigest() class ScriptHandler(ScriptService.Iface): def itemFound(self, item): return FoundItemResult(action=Action.Include) def itemProcessed(self, item): custom_columns = [] if item.binaryFile is not None: # Generate SHA-256 file_sha256 = sha256(item.binaryFile) sha256_column = CustomColumn("SHA-256", CustomColumnType.String, CustomColumnValue(value=file_sha256)) # Generate SHA-1 file_sha1 = sha1(item.binaryFile) sha1_column = CustomColumn("SHA-1", CustomColumnType.String, CustomColumnValue(value=file_sha1)) # Generate SHA-512 file_sha512 = sha512(item.binaryFile) sha512_column = CustomColumn("SHA-512", CustomColumnType.String, CustomColumnValue(value=file_sha512)) # Add all custom columns custom_columns = [sha256_column, sha1_column, sha512_column] return ProcessedItemResult(action=Action.Include, customColumns=custom_columns) 1 Quote
Jacques B Posted September 5 Report Posted September 5 Thanks for sharing. I'm looking at writing a crawler script for something else. Seeing examples helps. The unfortunate aspect of crawler scripts is you can only run one, and it must be while indexing the case. I remember Vound saying they are hoping that in the future, this won't be the case. But for now, that's the only time you can run a script. I am currently running one to look for emails with blank subjects and tagging them accordingly. I have another I'd like to run to extract a particular data point from a MS Word DOCx file and add that in a column. I'll either have to choose one or the other or process the case twice to be able to run both crawler scripts. With some cases taking 12+ hours to index, that's not a very attractive option. There is a caveat with yours that you are calculating three additional hashes per file which will add processing time. For a small case, that likely won't be too noticeable. But if you have a case with 100GB+ for example, the additional processing time will certainly be noticeable. Quote
igor_r Posted September 16 Report Posted September 16 Thanks Matt and Jacques! Yes, the script could be improved by reading each file just once. You can see the modified version here: https://github.com/vound-software/intella-crawler-scripts/blob/main/samples/advanced/calc_multiple_hashes.py Quote
igor_r Posted September 16 Report Posted September 16 On 9/5/2024 at 11:22 PM, Jacques B said: The unfortunate aspect of crawler scripts is you can only run one, and it must be while indexing the case We plan to add so-called utility scripts in a future version. Utility scripts will not require re-indexing and could be run with existing items in any case. 1 Quote
Jacques B Posted September 16 Report Posted September 16 Thanks Igor, that will be a huge benefit. It will encourage more people to author scripts for specific needs. Currently, it's not enticing to do so as it takes too long to run each script when dealing with large cases. I have a few script ideas, but haven't invested time to try and write them for that specific reason (having to re-index the case each time you want to run a script). Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.