Need to check if one of 10 images exist in a group of PDFs




I am looking to automate the following process:
Check if a signature exists within a group of PDFs and move them to a folder accordingly

We receive about 100 PDFs (scanned images) a day, each with one signature. The PDFs are stored in a specific folder. We also have selective signatures (signatures we are interested in) stored as images. We need the robot to go through each PDF one at a time and check if the signature we are interested in exist in the PDF. If signature of interest exist, move the PDF to folder A, else, move to folder B.

I have tried using Image Exist, but no luck yet. We are looking to execute this robot as back office process in unattended mode.

Question - Is it possible to do Image Exist activity in background, without having to have the PDF open?


Looking at the images depends on your PDF. Some have the images as elements so you can simply find the selector then use OCR on the image. Some is all image based.

Yes, you can do this in back office but will require that you have correct resolution. To do this, you will probably need to kick off the Process with a Launch Workflow Interactive, and typically 1920x1080 with 32depth is used in parameters.

No, you can’t look at an image with the application in the background unless it has a selector.

The steps that I have taken to look for the signature are as follows:
- change Zoom field in your PDF viewer to ideal view for Image recognition and OCR
- Use Arrow keys and Pagedown to scroll page
- Look for image nearby the Signature box like “Please Sign Here” and stop scrolling
- Set Clipping Region using the found image element to fit around the Signature box
- Highlight new element’s clipping region for testing
- OCR using the element to convert to characters and decide if it’s empty or not
EDIT: I’m also adding that I found Google OCR will throw an error with certain Signature sizes, so I needed to make it more dynamic with a Try/Catch and decreasing the Scale by .1, but always interested in ways to improve this functionality

But, like I said this process is what I was able to get working for image only PDFs in my back office environment, and works well I must say.

Your process might be different depending how how your PDFs and Signature box are.