Skip PDF that are not according to Template

Hallo people,

I have run into a bit of a problem that i can’t figure out, and I’m hopping that someone can direct me toward a solution, as you’ve done before.

So, the idea of the process is as follows: i have a high number of PDF Files, out of which i have to extract some information: contract date, amount and contract number and also add the name to the file, for easy identification. This part i managed to do, using regex and “matches” to look up the information and building a data table which i write into an excel file at the end.

My problem is as follows - how can i make the automation skip any PDF file that is outside of the template that i need? i tried using an if function combined with an “is match” - didn’t work, tried using “try catch” - that didn’t work either. I’m not saying that those wouldn’t be the solutions, but that i could have used them wrong :smiley:

Any suggestion would be of help.

Thank you,
Cristi

Hi @Cristian_Ionita

You can read the PDF before applying the regex, then keep a list of unique keywords(email domain/website URL etc) that specify the templates, you can keep an excel sheet. Then check if the pdf text contains any of the listed keywords from excel if matches you can proceed with regex extraction else you can ignore that one.

Hope this works

Thanks
Anoop

it was so simple, it just skipped my mind :smiley: i added directly in the if condition to search for a specific word that can be found only in those types of documents "pdf.tostring.contains(“variable”)

Thank you :smiley:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.