Processing PDFs

Jacques B · November 16, 2023

Here's my GitHub repository of a BASH script I wrote to parse PDFs.

https://github.com/jjrboucher/PDF-Processing

It runs several different commands against it, plus I added my own processing to it, looking for previous versions of a PDF within a PDF (explained in the ReadMe on GitHub).

Best,

Jacques

November 22, 2023

Nice one! I am going to see if I can make this a crawler script so you can run it automatically on all your cases if you want. I will keep you informed.

Marco

Jacques B · December 6, 2023

Thanks Marco,

It would be great to incorporate even the part that counts the # of %%EOF in a PDF and displays that in a custom column. That would alert the reviewer to the fact that a PDF has prior versions (that may or may not be recoverable, but knowing is half the battle). That would be a very easy (and short) Python script.

Jacques

Jacques B · May 9

On 11/22/2023 at 10:52 AM, Guest Marco de Moulin said:

Nice one! I am going to see if I can make this a crawler script so you can run it automatically on all your cases if you want. I will keep you informed.

Marco

Hi Marco,

I was looking at this again today and wondering if a crawler script could be written to do a few things:

For all PDFs in the case:

1 - count the # of "%%EOF" in the file and return that count to a custom column.

2 - if the count is > 1 and offset to first %%EOF is less than decimal 600, don't attempt to carve that version, as it's not a valid version.

3 - for all %%EOF with an offset greater than 600, carve each version of the PDF and add to the case as a child of the original PDF.

From looking at some of the scripts and the information in scripting.thrift on GitHub, the only part I'm not sure about is whether I can create a new file and add it to the case. Creating a custom column with a count of the # of %%EOF in a PDF looks to be something that wouldn't be too challenging. Not to suggest I could write it quickly as I have never written a crawler script. But from looking at existing ones, I think I could take an existing one and make some code modifications to achieve this.

Is it possilble to create a new file and add it to the case (as a child of an existing item) for processing?

Thanks,

Jacques

Sign In

Processing PDFs

Recommended Posts

Jacques B

Link to comment

Share on other sites

Guest Marco de Moulin

Link to comment

Share on other sites

Jacques B

Link to comment

Share on other sites

Jacques B

Link to comment

Share on other sites

Join the conversation

Browse

Activity