Jacques B Posted November 16, 2023 Report Share Posted November 16, 2023 Here's my GitHub repository of a BASH script I wrote to parse PDFs. https://github.com/jjrboucher/PDF-Processing It runs several different commands against it, plus I added my own processing to it, looking for previous versions of a PDF within a PDF (explained in the ReadMe on GitHub). Best, Jacques Quote Link to comment Share on other sites More sharing options...
Guest Marco de Moulin Posted November 22, 2023 Report Share Posted November 22, 2023 Nice one! I am going to see if I can make this a crawler script so you can run it automatically on all your cases if you want. I will keep you informed. Marco Quote Link to comment Share on other sites More sharing options...
Jacques B Posted December 6, 2023 Author Report Share Posted December 6, 2023 Thanks Marco, It would be great to incorporate even the part that counts the # of %%EOF in a PDF and displays that in a custom column. That would alert the reviewer to the fact that a PDF has prior versions (that may or may not be recoverable, but knowing is half the battle). That would be a very easy (and short) Python script. Jacques Quote Link to comment Share on other sites More sharing options...
Jacques B Posted May 9 Author Report Share Posted May 9 On 11/22/2023 at 10:52 AM, Guest Marco de Moulin said: Nice one! I am going to see if I can make this a crawler script so you can run it automatically on all your cases if you want. I will keep you informed. Marco Hi Marco, I was looking at this again today and wondering if a crawler script could be written to do a few things: For all PDFs in the case: 1 - count the # of "%%EOF" in the file and return that count to a custom column. 2 - if the count is > 1 and offset to first %%EOF is less than decimal 600, don't attempt to carve that version, as it's not a valid version. 3 - for all %%EOF with an offset greater than 600, carve each version of the PDF and add to the case as a child of the original PDF. From looking at some of the scripts and the information in scripting.thrift on GitHub, the only part I'm not sure about is whether I can create a new file and add it to the case. Creating a custom column with a count of the # of %%EOF in a PDF looks to be something that wouldn't be too challenging. Not to suggest I could write it quickly as I have never written a crawler script. But from looking at existing ones, I think I could take an existing one and make some code modifications to achieve this. Is it possilble to create a new file and add it to the case (as a child of an existing item) for processing? Thanks, Jacques Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.