Jump to content

PDF: Content Created vs xmp CreateDate not matching

Recommended Posts

I have a case where the xmp CreateDate is 2023-11-04 10:59:26+0200 (when looking at Raw Data).  But when I look at Properties, it lists the Content Created date as 2023-11-04 9:59:26 AM CET.

CET is UTC +0200. And they do not observe daylight savings time.

The xmp CreateDate is plain text in the PDF. But here is where it gets interesting. That is not the correct date. It's the Content Created date that is correct. I know this because the document was attached to an email sent 2023-11-04 10:07:18 CET. It's not possible that it was created after it was sent. There are other artifacts in the case that support that the Content Created date is actually the correct one.

I looked at the PDF in a hex viewer (HxD) and confirmed that the xmp dates are plain text. So there is no misinterpreting those. I also used exiftool to extract metadata and all dates it extracted are for 10:59, not 9:59.

What I'm trying to figure out is where does Intella get the Content Created date? In the 2.6 manual, it's described as "Content Created: The date that the content was created, per the document metadata.". Is it extracting an encoded date in the pdf that it's parsing? The standard for PDFs is over 700 pages long, but I did find that 7.9.4 relates to dates. And in that paragraph, it describes the date as a string (as observed in the hex analysis I did). I even tried a Python pdf library to parse metadata out with the same results as I see in Exiftool, and the same results as I see in raw data.

My question is, where does Intella get "Content Created" date under the Properties screen? It would be great if I can validate that and then reference that in my report. It doesn't explain why the other dates are wrong in the meantime (1 hour later), but at least I can explain where that date came from.

Link to comment
Share on other sites

Hello Jacques,

If I'm not mistaken CET is +0100 according to Wikipedia (https://en.wikipedia.org/wiki/Central_European_Time). It is +0200 but only in Summer. This specific date was on Nov 4, which is after the Summer time ended. Therefore, Intella shows it correctly:

2023-11-04 10:59:26+0200 -> 2023-11-04 08:59:26 UTC -> 2023-11-04 09:59:26 +0100 CET

Could it be that you have selected the wrong timezone in the source settings? The way it works is that all dates are stored in UTC in Intella. Then, depending on the Source timezone, the dates are converted to the selected timezone. In your case, they are converted to CET.

  • Like 1
Link to comment
Share on other sites

Thanks @igor_r. After your post I went back to look at my case and realized that I picked CET, but should have picked Maputo. Both are UTC + 0200 (well CEST is +0200, CET is +0100), but CET does recognize daylight savings time as you pointed out, whereas Maputo does not. So it caused the time to be off. I've corrected it and now everything lines up correctly.

Thanks again for picking up on my oversight.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...