Extract certain key words from multiple pdfs

Hello hello,

I have a small problem i coulnd’t figure out yet and i was hopping that someone could point me into the right direction.

So, it goes like this: i have some PDF from which i need to extract some information - they are letters, that contain certain variables, like date, account number and so on. I need to extract these variables and compare them with a data table (the comparing part i can manage :smiley: )

Thing is, I have little to no experience on using Regex (i only used it in another small process but that was easy) and i was wondering if you know of any solution where i could make the selection of what i need easier. or any suggestion on what to read, where to look in order to figure it out.

Thanks,
Cristi

1 Like

Hey @Cristian_Ionita

This looks possible.

Kindly share your PDF or its screenshot to understand the format of it so that some way of doing that can be suggested.

Thanks
#nK

1 Like

sensitive information,so i can’t really share the file, but it goes something like this, as an example:

“neachitarea la scadenta a debitului dumneavoastra provenit din contractul incheiat cu Nume si prenume, cu numarul Numar contract, din data de Data contract , pentru care dumneavoastra aveti calitatea .”

The words that are bold are the variables which i would need to extract. Any hint or suggestion is more than welcomed, and i can figure out the rest :smiley:

thanks

1 Like

Hi!

May i know which ocr you’re using to extract the data. Can we try with ML activities to extract the data by using Document Understanding?

Have a view on this video: UiPath Document Understanding # 7 | Extract and Validate using ML Extractor | ExpoHub | By Rakesh - YouTube

Regards,
NaNi

1 Like

Hi

As @THIRU_NANI suggested this can be accomplished with Document understanding in a easier way

For a demo have a view on this

Cheers @Cristian_Ionita

2 Likes

Hi!

Try this out:

System.Text.RegularExpressions.regex.match(strVariable,"(Nume si prenume)|(Numar contract)|(Data contract)")

Reference: regex101: build, test, and debug regex

Note: RegEx will only work when we need to extract the pattern data and also exact data

Regards,
NaNi

Hey @Cristian_Ionita

How are you manually deciding these bold words ?

Are those static or dynamic keywords or any other rules…

Kindly confirm.

Thanks
#nK

they are dynamic - sorry if i forgot to mention. Basically, i have a template for a letter that we are sending to our clients and i want to double check the final result of this letter with a data table. in the whole letter, only a couple of variables are changing (the ones that are bold in my example). I don’t know if i managed to explain properly

1 Like

ML is still something that i cannot grasp fully at my level, but thank you both for the suggestion - i’ve put them on my learning list :smiley: