Regex Based Extraction

Hi team Need to extract the Red highlighted value from the PDF

Please find the PDF Below

Jun-23.pdf (137.3 KB)


Regards,

The Value you have highlighted is not the correcct One the one whch I need according to the PDF is 53,25,000.00 which is just below the value which you have highlighted

@Ishan_Shelke
image-------->output

regex pattern—>(?<=June-23\s+Al\s+)[0-9,.]+

here is the xaml
Main.xaml (8.9 KB)

My bad , here is the regex

Regards

This isn’t reliable the Month factor which you have used, the month keeps on changing in my case

Hi The value factor which you have used here isn’t reliable also. The values too keep on changing

hey!
can u try with this regex pattern


(?<=(Total:\s+)[A-Za-z0-9\s.]+)\d{1,},\d{1,},\d{1,}.\d{1,}

can also try with this :(?<=(Total:\s+)[A-Za-z0-9\s.])\d{1,},[0-9.,]
image

I tried keeping Total as the base since u need the total value

Can you explain what activity you are using so that your Text is getting extracted like this ? And what package version you are using ? Because When I am using Read PDF text mine is getting extracted like this. Can you help me in getting the same value from this text file ?

sample.txt (4.3 KB)

Hi @Ishan_Shelke

You can try this Regex expression for total

str_InPut=“YourText”

System.Text.RegularExpressions.Regex.Split(str_InPut.Split({“Invoice Value”},StringSplitOptions.RemoveEmptyEntries)(0).Trim.Split({vblf},StringSplitOptions.RemoveEmptyEntries).Last.ToString.Trim,“\s.*”)(0)

for reference you can see the output

1 Like

Okey I have considered your sample text as input


Try with this regex now:(?<=(Line Total:)[\s\S]+)(\d{1,},\d{1,},\d{1,}.\d{1,})(?=\s+(\d{1,},\d{1,},\d{1,}.\d{1,}))

Let me know if it works

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.