I attached a PDF here with the information I need extracted highlighted. The number of items at the bottom of the PDF will vary, depending on what each person is ordering. The Item number (0001 and 0002 in this case) will always be 4 digits but the highlighted part number under description may vary. The other information will changed if a new person orders.

Can you help me with this? Is regex the best solution?
PO Example 1_highlighted.pdf (991.4 KB)

Hi @kasey.betts,

PDF file which you have shared, and the highlighted data does all the template remains the same? And also, only the highlighted values only change?


The template remains the same. The filled in values will change while the titles remain the same. For example, The title "1. CONTRACT/PURCH ORDER/AGREEMENT NO. " will remain the same while the number beneath it (P0001 in this case) will change.

@kasey.betts ,

If the template remains the same, did you try to use Document Understanding to extract the data?
or you need to extract through regex only?


I need to use RegEx




Can you mention all the titles in which the values need to be fetched?
As you have highlighted in document 1 and 5 are highlighted meaning in between 2 to 4 column values are not required?

So, just mention the title alone or the only highlighted one in the document is enough?


I only need the info that is highlighted.

  3. ISSUED BY (only customer name)
  4. SHIP TO

Then at the bottom I need the Description and the Quantity. However, I will only be using the highlighted part of the description which is a part number. The format of that number (210-ANJK for example) may change from PDF to PDF but will always be located in the same place


As you have requested, to extract the information through REGEX, there is an issue defining the regex and I have attached the OCR output below.

If you see in the above screenshot, two-column values are displayed which are dynamic right! So, it’s hard to develop regex to this.

So, is it ok to extract in any other extractor?



Regex will only work for the pattern data. in your case we can’t extract that with the regex!

but we can extract that by using the DOCUMENT UNDERSTANDING. with Machine Learning.

you have to install these packages to use the DU with ML.


have a view on this video


This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.