Regex Expression to Extract Total From PDF

I have a PDF like this

image

I want to extract the Value which in the total column I have converted the pdf Text using the Read PDF text Activity, Just need a regex expression for the same

kindly share this table in text format

Regards

can you share the output text which you get after read pdf text activity

1 Like

Hi @Ishan_Shelke ,

Could you please share the PDF’s ouput text. If your total value always comes after/before any fixed value like ‘Total’ or ‘Line Total’; then you can use Lookahead assertion or Lookbehind assertion.

I can share the whole regex pattern once you share the pdf text.

sample.txt (1.6 KB)

Here is the Sample Guys

I need the value which is below Total Column in this case it is :

image

Hi @Ishan_Shelke ,
this image is input?
you use read pdf text to get String
then use regex to get a string in that,
can you share file or your string to test regex?

regards,

Hey @Ishan_Shelke can u try this regex pattern (?<=(Total)\s+)(₹\d+,\d+.\d+) or u can also go with this pattern:₹\d{2},\d{3}.\d{2}
image

Regards,

Hey @Ishan_Shelke ,

I have made a xaml according to your requirements
Below is the output screenshot
image

Below is the File
Ishan Forum.zip (3.3 KB)

Below is the regex that i used
(?<=\sTotal\s*₹ )\d+(,|.)\d+(,|.)\d+

This regex function locates the term “Total” and captures numerical values, even if they include commas (,) or periods (.), from the PDF text. This approach is effective because the numeric format in the PDF content might vary, such as:

  • 686.33
  • 87886,99
  • 899,90.00

Thus, this regex pattern adequately handles diverse numeric representations.
Hope it helps you out!

Happy automation

1 Like

Hi,

Could you check the below expression for your all PDF output’s text →

System.Text.RegularExpressions.Regex.Matches(txt, “(?<=\₹\s)[\d,.]+”)(0).ToString

I am simply taking first amount which is preceded by \₹.

The Rate amount is also there but not preceded by \₹, if your all extracted PDF text is getting in same format then shared expression will work.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.