Regex pattern needed to extract invoice numebrs after reading PDF text

Hi All

I need a regex pattern to grab the invoice number and Amount from below text
1.Sample text provided below
2.The expected output i am seeking is below. A table with two columns. Invoice number and Gross amount
|Invoice Number| Grouss Amount|
|9073223290|153.91|
|9073223289|4,963.75|
|9073223287|2,373.90|
|9073223286|27,827.00|
|9073223302|1,943.33|
|9073223307|6,476.80|
|9073223251|21,187.10|
|9073223250|1,943.33|
|9073223308|6,384.40|
|9073223267|25,088.40|
|9073223301|86,545.57|
|9073224814|22,410.71|
|9073224846|88,248.89|
|9073224816|20,829.60|
|9073224812|20,431.54|
|9073224798|1,943.33|
|9073224844|1,943.33|
|9073224792|46,460.52|
|9073224847|1,943.33|
|9073224848|1,943.33|
|9073224850|1,943.33|
|9073224811|20,829.60|
|9073224799|1,943.33|
|9073224849|1,943.33|
3.The pattern of the text: Invoice number always start with 9, and of 10digits. Gross Amount is always after "0.00 "

Any help would be appreciated.

Many thanks in advance

Sample text as below:

Invoice Number Cash Discount Gross Amount Net Amount

9073223290 0.00 153.91 153.91
9073223289 0.00 4,963.75 4,963.75
9073223287 0.00 2,373.90 2,373.90
9073223286 0.00 27,827.00 27,827.00
9073223302 0.00 1,943.33 1,943.33
9073223307 0.00 6,476.80 6,476.80
9073223251 0.00 21,187.10 21,187.10
9073223250 0.00 1,943.33 1,943.33
9073223308 0.00 6,384.40 6,384.40
9073223267 0.00 25,088.40 25,088.40
9073223301 0.00 86,545.57 86,545.57
9073224814 0.00 22,410.71 22,410.71
9073224846 0.00 88,248.89 88,248.89
9073224816 0.00 20,829.60 20,829.60
9073224812 0.00 20,431.54 20,431.54
9073224798 0.00 1,943.33 1,943.33
9073224844 0.00 1,943.33 1,943.33
9073224792 0.00 46,460.52 46,460.52
9073224847 0.00 1,943.33 1,943.33
9073224848 0.00 1,943.33 1,943.33
9073224850 0.00 1,943.33 1,943.33
9073224811 0.00 20,829.60 20,829.60
9073224799 0.00 1,943.33 1,943.33
9073224849 0.00 1,943.33 1,943.33

@diana.dai

welcome to the forum.

had you checked the option to replace space (blank) with e.g. | (Pipe) and parse the text as csv?

other wise have a look here working with ommiting groups (?:… Syntax)
(\d+) (?:\d+\.\d+) ([\d\,\.]+)
grafik

grafik

@diana.dai - Please check this Regex pattern.

Hi, Peter, Thanks for the reply. sorry i didn’t know how to insert a table as the output.

I was using activity Matches to get invoice numbers and Amount.

Could you please also advice how to write the output of matches in excel file when the items in in ienumerable is more than one? in my case. how to write these invoice Nos and Amount in an excel as below?

Thanks!

@diana.dai
Sure we will help you. May we ask you following:

  • what type of source is the input from (e.g. text.file…)?

Thanks for answer

Hi, Peter

The input is from a PDF file. and the Output should be an Excel file.

output.xlsx (10.3 KB)

@diana.dai

grafik

  • prepare an empty datatable within the target column structure (Invoice No, Gross Amount)
  • execute the regex
  • iterate over the matches and add it to the datatable
  • write the datatable to Excel with write range activity

find starter help here:
RegexMatch_ToDataTable.xaml (8.2 KB)

2 Likes

Thanks so much Peter!!

@diana.dai
So it looks that it is working. May we ask you to flag the solving post as solution after you have done your final testing. So others will benefit from it. Sure we are here for your remaining open questions.
Thanks

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.