Want to get a second opinion on a Regex before I suggest an alternative to this process to the business team

I have an input PDF file - seems to vary depending on vendor. This is a file used to validate invoice entry before our bots move documents to the corporate retention system. I am having difficulty finding a pattern and think this may not be something we can do. Here are examples of some line items:

39 320566 00100 001 2/10/2022 11/18/2021 00100 2365408 AxLlone USA A 42,510.00
10/4/2021 00873953 37419 T

39 320567 00100 001 2/10/2022 3/17/2022 00100 6257763 Englewood Lab A 165,462.00
1/31/2022 00874764 9101810

39 320573 00100 001 2/10/2022 3/10/2022 00100 104430 SensLent CosmetLc Technol A 1,666.30
1/24/2022 9IR 00877413 1009468 9

Each line begins with “39” but not all entries are on multiple lines and not all entries end the line with “T”. The need is to extract the invoice number - bolded on the samples above. Not all lines have the same number of characters. All lines, however, seem to have to “00100” - one at the beginning and one after the second date.

I have tried a number of different regex statements but have come to the opinion that this is not something we can do. I have also tried replacing blank spaces with a delimeter using Replace but to no avail.

I am looking for a second opinion as I mentioned. If you do not agree with me, what suggested regex would you utilize?

Chris

Hi,

How about the following?

System.Text.RegularExpressions.Regex.Matches(yourString,"^\d{1,2}/\d{1,2}/\d{4}(\s+\w+)?\s+(\w+)\s+(?<TARGET>\d+)(\s+\w)?$",System.Text.RegularExpressions.RegexOptions.Multiline)

Sequence4.xaml (7.0 KB)

Regards,

Thanks but unfortunately that works with the 3 examples I presented but when the entire pdf txt - which I cannot share - is read and your regex applied I get no matches although the examples came from the same pdf. I do appreciate the suggestion, however.

1 Like