I have an input PDF file - seems to vary depending on vendor. This is a file used to validate invoice entry before our bots move documents to the corporate retention system. I am having difficulty finding a pattern and think this may not be something we can do. Here are examples of some line items:
39 320566 00100 001 2/10/2022 11/18/2021 00100 2365408 AxLlone USA A 42,510.00
10/4/2021 00873953 37419 T
39 320567 00100 001 2/10/2022 3/17/2022 00100 6257763 Englewood Lab A 165,462.00
1/31/2022 00874764 9101810
39 320573 00100 001 2/10/2022 3/10/2022 00100 104430 SensLent CosmetLc Technol A 1,666.30
1/24/2022 9IR 00877413 1009468 9
Each line begins with “39” but not all entries are on multiple lines and not all entries end the line with “T”. The need is to extract the invoice number - bolded on the samples above. Not all lines have the same number of characters. All lines, however, seem to have to “00100” - one at the beginning and one after the second date.
I have tried a number of different regex statements but have come to the opinion that this is not something we can do. I have also tried replacing blank spaces with a delimeter using Replace but to no avail.
I am looking for a second opinion as I mentioned. If you do not agree with me, what suggested regex would you utilize?