Michael Magness Posted November 21, 2023 Report Posted November 21, 2023 I was wondering if anyone knew anything about PhotoDNA, specifically in using the API. You'll have to forgive the basic questions/ We've tried going to Microsoft but they're taking a while to respond so I thought I'd start here since we're trying to write a script to integrate into intella.. 1) Do we need to host the images online before submitting them to PhotoDNA? From the examples I've seen, we need to provide a URL of an image to the PhotoDNA API for it to verify. 2) We've got an API Key to access PhotoDNA but do we need any additional enablement on MS's side to use their Edge Hashing API? 3) re the edge hashing, while I can generate the hashesh using pyPhotoDNA, I'm not sure how to submit the hash to the Edge Hash API. Does anyone have an example of the "body" on how they submit the hashes? Thanks in advance. Quote
igor_r Posted November 21, 2023 Report Posted November 21, 2023 Hi Michael, I don't have any experience with PhotoDNA or pyPhotoDNA. But if you can show a script for generating a hash using Python I can help you with translating it into a crawler script. That should be somewhat similar to the Grayscale detection script that I posted here: The idea is that you take the image content which is stored in item.binaryFile, then call PhotoDNA library to calculate the hash and then store it in a custom column. Quote
Michael Magness Posted November 22, 2023 Author Report Posted November 22, 2023 Hi Igor. Thanks for responding 🙂. My problem isn't coming up with a crawler script. I've got access to the pyPhotoDNA which supposedly generates the hashes based on a series of exported images. My problem is sending it to the PhotoDNA Match Edge Hash API. I have no idea how to format the hashes & name so that the API understands it. I've tested it on the API test but each time I do it I get an error. Here's the link I'm using at Microsoft to test.. https://developer.microsoftmoderator.com/docs/services/57c7426e2703740ec4c9f4c3/operations/596ea1487ecd9f1ba408c32f/console Also note that while I'd like to think of myself as a developer, the last time I wrote something was in VB6 🙂 (they made me a manager a while ago so.. no dev for this guy anymore) So It might be something as simple as formatting a JSON file / body which I don't have a lot of experience with so you'll forgive the basic questions 😞 Again, thanks for taking the time to help me. Quote
Guest Marco de Moulin Posted November 22, 2023 Report Posted November 22, 2023 Hello @Michael Magness, From what I understand, you need to send two header fields to the API: Content-Type with content `multipart/form-data` Ocp-Apim-Subscription-Key containing your personal API key After that you can send the image hash in the body. I cannot test this because I have no access to the API. I imagine it looks something like this: Windmill.jpg|2,8,2,13,4,9,5,12,12,14,62,9,15,7,31,12,7,0,3,12,5,0,0,12,8,6,140,16,43,3,180,15,183,63,255,97,34,181,255,25,14,4,217,20,22,0,215,10,14,21,255,1,26,9,255,6,131,28,255,33,64,153,255,9,31,5,255,5,41,1,255,5,0,44,46,48,7,32,55,57,118,24,49,36,58,123,61,31,18,4,60,34,24,1,50,31,3,14,134,13,9,15,166,7,102,18,149,10,46,106,129,19,20,14,137,26,30,5,131,42,6,13,34,10,6,9,32,10,30,10,24,21,17,29,35,23,14,8,64,9,21,4,69,8 To make these calls, you must create a script in, for example, Python. Alternatively, you can do it for one image with curl on the command line. You get a JSON object back containing information about the hash. Does this help in any way? Marco Quote
Michael Magness Posted November 23, 2023 Author Report Posted November 23, 2023 Hey Marco, Thanks VERY much for responding. You're pretty much nailed the problem on it's head 🙂 While I can create a script, the problem is the content / hash of which I send to the API in the script. As a test I've managed to generate a hash using one of the images that MS provides as a test but no matter what I give it, I keep getting an "Error occurred while processing request". I can confirm it's not an API key issue as feeding it a "wrong" API key returns a different error. There are a few of possibilities here 1) The blurb on the page states that "Edge Hash is currently in preview. If you would like to learn more about edge hashing and how to be part of the preview please email m2support@microsoft.com". This hints at a possibility where the feature needs to be enabled for that key. Without them enabling "Edge hash" feature on our API key, it just wont' work. I have, of course, emailed them but MS doesn't seem to be paying attention to that email address.. *sigh* 2) The format I'm giving it is wrong. I'm just not sure HOW to encapsulate the body of the text of which I send to the API which is why, talking to you, may help to eliminate this lack of knowledge 🙂 3) The hash I'm feeding it is wrong. I've tried MD5s, SHAs and even their proprietary Hash from a PhotoDNAx64.dll but I keep getting the same error. This is the test hash I use generated .. This is using their PhotoDNAx64.dll which I got from an FTK ISO. Note that the image is a test image from Microsoft and is in the public domain. img_130.jpg|Uy8FRArCakzLW5GGiwMMSwIRAzIBGwBFoxEWdDc9N5oymEaJYScQ/0oSE/wZRSJ5XyFPFhBuSkYqZVhhWCp0MXoDnEUzGj8yRVcXUylHJDUqcVcWcmcwcFMCEGEGBxFKEGAnGDEzFWQlXj23W6BAc6UJETsOARIDCVAeCxKyG0NtViuDWVdTIqITTB0UAQoF Let me know your thoughts...if any. Thanks again. Quote
Guest Marco de Moulin Posted November 23, 2023 Report Posted November 23, 2023 Hello @Michael Magness, Perhaps you try the following approach. Replace the api_key with your API key: import requests # Placeholder where you would set your PhotoDNA Cloud Service API key api_key = 'your_api_key_here' # The endpoint URL for submitting to the PhotoDNA MatchEdgeHash API url = 'https://api.microsoftmoderator.com/photodna/v1.0/MatchEdgeHash' # A dictionary associating image filenames with their corresponding PhotoDNA hash strings # Replace these example hashes with your actual hashes image_hashes = { 'Windmill.jpg': "2,8,2,13,4,9,5,12,12,14,62,9,15,7,31,12,7,0,3,12,5,0,0,12,8,6,140,16,43,3,180,15,183,63,255,97,34,181,255,25,14,4,217,20,22,0,215,10,14,21,255,1,26,9,255,6,131,28,255,33,64,153,255,9,31,5,255,5,41,1,255,5,0,44,46,48,7,32,55,57,118,24,49,36,58,123,61,31,18,4,60,34,24,1,50,31,3,14,134,13,9,15,166,7,102,18,149,10,46,106,129,19,20,14,137,26,30,5,131,42,6,13,34,10,6,9,32,10,30,10,24,21,17,29,35,23,14,8,64,9,21,4,69,8", # PhotoDNA hash string for Windmill.jpg #'img_130.jpg': "3,7,2,14,...", # PhotoDNA hash string for img_130.jpg # Add more as needed, up to 5 } # Ensure there are no more than 5 hashes as per the API's restrictions if len(image_hashes) > 5: raise ValueError("More than 5 hashes provided. The API only accepts up to 5 hashes.") # Construct multipart/form-data with the image hashes files = {} for filename, hash_string in image_hashes.items(): hash_values = list(map(int, hash_string.split(','))) hash_binary_data = bytes(hash_values) # Use the filename as the key to keep track of which hash corresponds to which image files[filename] = (filename + '.bin', hash_binary_data, 'application/octet-stream') # Set up your headers, including the API key headers = { 'Ocp-Apim-Subscription-Key': api_key, } # Make the POST request to upload the hash data response = requests.post(url, files=files, headers=headers) # Check the response to determine if it succeeded, and handle accordingly if response.status_code == 200: # Request succeeded, handle the response data print(response.json()) else: # An error occurred, handle it print(f"Error {response.status_code}: {response.text}") Again, I do not have access to the API, so I was not able to test it. This code does not encode the content in base64. Let me know your results. Cheers, Marco Quote
Michael Magness Posted November 24, 2023 Author Report Posted November 24, 2023 Thanks Marco.. I'll test and let you know. BTW.. Where did you get the hash? because mine seems different using the PhotoDNA.dll.. UPDATE. It did work (thanks SO much), in so much as in the script managed to send the file in the requisite format to the API and receive results back BUT the results are still giving me a an "Error while processing request". {'Status': {'Code': 3004, 'Description': 'Error occurred while processing request', 'Exception': None}, 'ContentId': None, 'IsMatch': False, 'MatchDetails': None, 'XPartnerCustomerId': None, 'TrackingId': 'WUS_7cff4dce2d364b00a1c681fafb9dd112_57c7457ae3a97812ecf8bde9_276fc8f715054d2f81d3f8a26e07734b', 'CMRequestId': None} It's certainly pointing me in the right direction so I'll play with it for a bit and see. Your code did miss out on one of the headers which I added back in. Without it, it kept giving me an "invalid or missing parm" error. Content-Type: multipart/form-data Quote
Michael Magness Posted November 27, 2023 Author Report Posted November 27, 2023 Hi Marco, Again, this may be a "beginner" question but how did you get your hash? because mine seems different using the PhotoDNA.dll (or MD5 or SHA etc).. Mine's more like "Uy8FRArCakzLW5GGiwMMSwIRAzIBGwBFoxEWdDc9N5oymEaJYScQ/0oSE/wZRSJ5XyFPFhBuSkYqZVhhWCp0MXoDnEUzGj8yRVcXUylHJDUqcVcWcmcwcFMCEGEGBxFKEGAnGDEzFWQlXj23W6BAc6UJETsOARIDCVAeCxKyG0NtViuDWVdTIqITTB0UAQoF" VS yours being "2,8,2,13,4,9,5,12,12,14,62,9,15,7,31,12,7,0,3,12,5,0,0,12,8,6,140,16,43,3,180,15,183,63,255,97,34,181,255,25,14,4,217,20,22,0,215,10,14,21,255,1,26,9,255,6,131,28,255,33,64,153,255,9,31,5,255,5,41,1,255,5,0,44,46,48,7,32,55,57,118,24,49,36,58,123,61,31,18,4,60,34,24,1,50,31,3,14,134,13,9,15,166,7,102,18,149,10,46,106,129,19,20,14,137,26,30,5,131,42,6,13,34,10,6,9,32,10,30,10,24,21,17,29,35,23,14,8,64,9,21,4,69,8" Quote
Guest Marco de Moulin Posted November 27, 2023 Report Posted November 27, 2023 Hello @Michael Magness, I used the instructions from https://github.com/jankais3r/jPhotoDNA. This gives me the string starting with 2, 8, 2, etc. .\jPhotoDNA.exe .\PhotoDNAx64.dll .\Windmill.jpg .\Windmill.jpg|2,8,2,13,4,9,5,12,12,14,62,9,15,7,31,12,7,0,3,12,5,0,0,12,8,6,140,16,43,3,180,15,183,63,255,97,34,181,255,25,14,4,217,20,22,0,215,10,14,21,255,1,26,9,255,6,131,28,255,33,64,153,255,9,31,5,255,5,41,1,255,5,0,44,46,48,7,32,55,57,118,24,49,36,58,123,61,31,18,4,60,34,24,1,50,31,3,14,134,13,9,15,166,7,102,18,149,10,46,106,129,19,20,14,137,26,30,5,131,42,6,13,34,10,6,9,32,10,30,10,24,21,17,29,35,23,14,8,64,9,21,4,69,8 I think what you have is base64 encoded and a byte data type. Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import base64 >>> encoded_string = "Uy8FRArCakzLW5GGiwMMSwIRAzIBGwBFoxEWdDc9N5oymEaJYScQ/0oSE/wZRSJ5XyFPFhBuSkYqZVhhWCp0MXoDnEUzGj8yRVcXUylHJDUqcVcWcmcwcFMCEGEGBxFKEGAnGDEzFWQlXj23W6BAc6UJETsOARIDCVAeCxKyG0NtViuDWVdTIqITTB0UAQoF" >>> decoded_bytes = base64.b64decode(encoded_string) >>> integer_values = [] >>> for byte in decoded_bytes: ... integer_value = int.from_bytes([byte], byteorder='big') ... integer_values.append(integer_value) ... >>> print(integer_values) [83, 47, 5, 68, 10, 194, 106, 76, 203, 91, 145, 134, 139, 3, 12, 75, 2, 17, 3, 50, 1, 27, 0, 69, 163, 17, 22, 116, 55, 61, 55, 154, 50, 152, 70, 137, 97, 39, 16, 255, 74, 18, 19, 252, 25, 69, 34, 121, 95, 33, 79, 22, 16, 110, 74, 70, 42, 101, 88, 97, 88, 42, 116, 49, 122, 3, 156, 69, 51, 26, 63, 50, 69, 87, 23, 83, 41, 71, 36, 53, 42, 113, 87, 22, 114, 103, 48, 112, 83, 2, 16, 97, 6, 7, 17, 74, 16, 96, 39, 24, 49, 51, 21, 100, 37, 94, 61, 183, 91, 160, 64, 115, 165, 9, 17, 59, 14, 1, 18, 3, 9, 80, 30, 11, 18, 178, 27, 67, 109, 86, 43, 131, 89, 87, 83, 34, 162, 19, 76, 29, 20, 1, 10, 5] Is this string above the same numeric list you see when you run: .\jPhotoDNA.exe .\PhotoDNAx64.dll .\img_130.jpg Marco Quote
Michael Magness Posted November 28, 2023 Author Report Posted November 28, 2023 Thanks Marco. You've been a GREAT Help! Quote
Michael Magness Posted December 11, 2023 Author Report Posted December 11, 2023 Hey Marco, Just an update to this. MS Support finally got back to me and turns out that even if you have access to the PhotoDNA API, access to the hash edge feature isn't turned on by default and you have to sign another addendum for them to provide you with specific libraries and access to the hash edge DNA. So while you can send hashes to it, it will not work unless they specially enable it for you for your keys. I also wanted to say, THANK HEAPS for all your help in this and the code you've provided won't go to waste 🙂 Have a great week ahead. Quote
Guest Marco de Moulin Posted December 12, 2023 Report Posted December 12, 2023 Hi @Michael Magness, Once you have finalized the script, could you please inform us? We are eager to develop a crawler script that can be used by a broader range of users who have access to the API. I am looking forward to seeing the possibilities. Do you already have a plan for executing your own script? Will it be run on a subset that's been exported out of Intella? Marco Quote
Michael Magness Posted December 21, 2023 Author Report Posted December 21, 2023 No problems. We just have to wait for our legal department to review the addendum and I'll get back to you. Most likely, this'll be done with a subset of data exported from Intella. Thanks again for ALL your help. Watch this space in the new year :-) Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.