Jump to content

Re-processing after cracking passwords


Jacques B

Recommended Posts

I occasionally encounter encrypted PDFs that Intella was unable to decrypt. Naturally, I only know this after processing is done. I've had success cracking passwords of PDFs of bank statements where the password is numeric (part of the account number). Once cracked, I know I can add it to the keystore. But as far as I can tell, I then have to re-index the entire evidence item(s) with content that needs to be decrypted. I don't see any option to simply decrypt and index the 10, 20 or 30 files that are encrypted. I have to re-index tens or hundreds of thousands of files in the evidence source(s).

Is there a way to have Intella only re-index select items instead of all items in a source?

Link to comment
Share on other sites

  • 2 weeks later...

Thanks. I’ve done that in the past. But the down side of that approach is the decrypted item is not at the original path within the evidence. For example, if the original is an attachement in an email, the decrypted version won’t be if imported as a new source.

 It would be great if Intella had the ability to index filtered files instead of needing to index all of them. 

  • Like 1
Link to comment
Share on other sites

I wonder if you could script it as part of initial processing?

It would be pretty unintelligent, but I wonder if you could do something like (100% pseudo-code):

if item.encrypted = true
	wordlist = get-content item.parent (separator 'whitespace')
	foreach word in wordlist
		try item.decrypt word

You could build your wordlist in a way that makes sense. The above is hoping the parent is an email and they've supplied the password in the email for example

Link to comment
Share on other sites

I’m not sure if Intella supports that type of scripting. In my case I’ve been using John the Ripper in a Linux VM to crack PDF docs typically. So I don’t think there would be any way to call upon it from Windows. 

The other challenge is that in the case of PDF bank statements for example, the accompanying email from the bank usually provides the mask for the password (e.g., the middle six characters of the bank account number) which I use as a parameter for cracking the password. In other cases, I’ve found the password right in the email. “Hey John, here’s the encrypted spreadsheet for your review. The password is “abc123”.

i wouldn’t want to delay Intella processing while it tries to brute force each time it finds an encrypted file it can’t automatically decrypt. I appreciate your suggestions as possible alternative options. The ideal solution rests with Vound adding the ability to process/re-process selected files. You would think you could choose only docs that it couldn’t decrypt and reprocess those with the keystore rather than hanving to reprocess every item in the data set. 
 

Thanks again for taking the time to offer suggestions. 

Link to comment
Share on other sites

  • 2 weeks later...
Guest Marco de Moulin

Hello @Jacques B,

I just wanted to give you an update on the ability of our crawler scripts to decrypt password protected PDFs. Currently, the scripts do not have access to the native file, which is necessary for decryption. We are actively working on adding this capability to the scripting engine, so that you will be able to run code to decode a PDF. As soon as I have more information on this topic, I'll be sure to update you. Thank you for your patience and understanding!

Marco

Link to comment
Share on other sites

Thanks Marco! Does this mean you'll be able to enter passwords in the keystore and then run it against specific files and process only those rather than having to re-process all items in a source?

Fortunately, it's not something I encouter frequently. But when I do and manage to crack a password (or get it from the email itself - people can be lazy sometimes :) ), I will add that to the keystore and then re-process so that it's available to the investigator. Being able to selectively reprocess would be a huge time saver in those cases.

 

Jacques

Link to comment
Share on other sites

Guest Marco de Moulin

Hi @Jacques B

I wanted to clarify that the keystore is used for all encrypted files, not just PDFs. With a crawler script, you can create a custom procedure for each file. This means that the script would have access to the file, attempt to decrypt it, and then return it in a decrypted form. However, it does require some programming skills to implement once the capability is added to the scripting engine.

I also wanted to let you know that selective reprocessing is a high priority on our list of features that we're working to include in the future.

Marco

Link to comment
Share on other sites

OK, thanks Marco. The selective reprocessing would be the ideal solution.

Adding the ability to try and crack it during processing is nice. But it would be very difficult to use a one size fits all decryption approach if using a third party such as John the Ripper. As it will depend on the document type, and if you have a mask for the password. And as you also know, password cracking can take a long time. You wouldn't want processing of the rest of the source to be held up by the attempt to crack a password. It would be important for processing to complete and make everything available to the investigator for review while password cracking goes on in the background.

If it will be implemented in a manner that processing stops while password cracking is attempted, that will have an undesirable delay and make it impractical to use. If that's the only option, I would suggest putting that time into the selective reprocessing instead, as that will be far more useful. But if the script simply passes on the encyprted password to an external process and then carries on, then that's fine. But that also means at some point, it has to reprocess those files once the password is cracked.

I do have some programming skills (scripting skills - BASH, Python, and some light PowerShell). So I don't mind that.

Thanks,

Jacques

Link to comment
Share on other sites

Guest Marco de Moulin

Hello @Jacques B,

Crawler scripts are executed when we index (crawl) the data. You are right, we do not want any unnecessary delays during this process.

One approach to use a crawler script is to copy all encrypted files that are discovered and execute a command to decrypt them. This approach is useful when there is no information about the password(s) required to decrypt the files. In such a scenario, brute forcing may be the only solution.

The keystore passwords can be used to decrypt supported files. With selective reprocessing this will become a lot more valuable because now you need to provide the passwords before processing a source. When using the keystore for passwords, it is recommended to keep the list of passwords short. This approach ensures that the process remains efficient and reduces the chances of delays. The keystore was designed to try out passwords that are already known, rather than for brute forcing.

Regards,

Marco

Link to comment
Share on other sites

  • 1 month later...

Hi Marco,

Providing an update on this. I am currently working a case where I have 5 PSTs in it. Intella identified 89 items that it could not decrypt. In looking at them, many are in a few ZIP files, so I gather it's the ZIP that needs to be cracked, not each file within it. At any rate, for some of the other encrypted items, the user sent the password in a separate message (email or Teams message) which is common.

My workflow when I have encrypted items from an Exchange mailbox is to look at the parent email of the encrypted item to see if the person shared the password, or references that it will be sent in another email. I was able to find a few passwords and added them to the key store. I could see from Location facet that the encrypted items were spread across 3 of the 5 PSTs. This meant I had to reprocess those 3 PSTs to have Intella use the passwords in the key store to decrypte the items and then index them.

This is a prime example of where the current workflow used by Intella to deal with encrypted items is inefficient. We are not likely going to know which items are encrypted, much less the password for the items, until after it's in Intella and processed. In addition to being able to selectively re-process files rather than an entire source, it would be really helpful if Intella noted what processing was already done on that source (e.g., OCR, content analysis) and prompted the user if it wants those additional processes to be run as well on the decrypted items.

I do see email threading as an exception here. You can't run email threading on only decrypted items. It has to be run against all emails in the case to get email threading across all your data.

Thanks,

Jacques

Link to comment
Share on other sites

  • 1 year later...
On 2/6/2023 at 10:36 AM, Guest Marco de Moulin said:

Hi @Jacques B

I wanted to clarify that the keystore is used for all encrypted files, not just PDFs. With a crawler script, you can create a custom procedure for each file. This means that the script would have access to the file, attempt to decrypt it, and then return it in a decrypted form. However, it does require some programming skills to implement once the capability is added to the scripting engine.

I also wanted to let you know that selective reprocessing is a high priority on our list of features that we're working to include in the future.

Marco

Hi Marco,

I'm following up on this posting from last year. Any update on the roadmap for selective processing of encrypted files? I have a case now where after ingesting and processing, it identified 86 files it could not decrypt. Looking at the emails, I was able to identify some of the passwords. I was also able to crack the password of a PDF. I added those passwords to the keystore and am now reindexing the case. Unfortunately, it's going to take a while as there is about 142 GB of data in the case. So all that has to be re-indexed for the sake of 84 MS Office files (about 1/2 of that when deduped) and two PDFs. Selective indexing would be so much more efficient. Instead of taking minutes to index those few files, it's going to take hours for Intella connect to clear the old index, and re-index everything.

Would it be possible to allow us to select files to be re-processed (e.g., only files that could not be decrypted) and in the background, Intella would re-add them as a new source without losing the relationship with other items in the case (vs exporting them out, decrypting them, and adding them back to the case as a new source which would detach them from their parent)?

At minimum if this could be accomplished for encrypted files, that would be significant. I can't think of a case where I'd need to re-index other items in the case. But encrypted items is commonly required to be reprocessed once you identify the password. But you won't identify that until after the source is processed, Intella has flagged what it couldn't decrypt, and you've gone in and found passwords in emails.


Thanks,

Jacques

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...