PDF extraction using Regex

Hi All,

There is a part of data that we are trying to extract from pdf. We are doing it using Regex. Problem we are facing is
we are able to match the Regex with the string online, its giving exact match. But when we are applying the same in studio,
we are not able to get the match.

The text is →

International Money Trnsfr CR (208)
12,818.48
903704290192809
00000000000
0

Regex we are using is:

[a-zA-Z ]+ Money [a-zA-Z ]+ (\d{3})+(\n)+(\d+|\d{1,3}(,\d{3})*)(.\d+)( |\S|\n)+\d{15}( |\n)+\d{1,12}(\n\d)+

The string is different is visibility in pdf…when converted to text its different.
When we see in PDF it looks like

International Money Trnsfr CR (208) 12,818.48 903704290192809 00000000000

But when code runs, and we write the data to text file to check it is in the below format:

International Money Trnsfr CR (208)
12,818.48
903704290192809
00000000000
0

Can I know how to overcome this?

Hi @vaish.ayodhya
did u check the option Preserve Format in Read PDF activity
?

Instead of \n, try
either \s
or \r\n

Hi Nived and Surya…Checked both options…still not working…
Tried using (^International).*([\n]?\d0{5,10}) -->this regex for ‘International Money Trnsfr CR (208) 12,818.48 903704290192809 00000000000’ this string…When I am writing the data into a text file, I able to see this string, and when I match it on Regex 101 its matching…But while executing the same in Studio, its returning false. Is there any other way?

Also have a look here:

Hi Peter. Thank you for your reply.

But like you mentioned, bracket wont match with (208), I used “(^International).*([\n]?\d0{5,10})” as my regex(new regex) just to match with the entire line–>‘International Money Trnsfr CR (208) 12,818.48 903704290192809 00000000000’. But this also isn’t working.

Also for the bracket, we used “Preauthorized ACH [a-zA-Z ]+ (\d{3}) (\d+|\d{1,3}(,\d{3}))(.\d+)( |\S)+(\ |)+\d{15} \d{1,12}” to match with ‘Preauthorized ACH Credit (165) 2,350.00 902320008786818 00000000000’. This seems to match online. But when the same in being applied in code, it is returning False.

will later check more in detail your feedback as I do feel that some things are not applied. In the meanwhile have alook on this:

([a-zA-Z ]+ Money [a-zA-Z ]+) \((\d{3})\)(?:\r?\n| )?([\d\,\.]+)(?:\r?\n| )?(\d+)(?:\r?\n| )?(\d+)(?:\r?\n| )?(\d+)


@ppr , can you please help on this?