PDF extraction using Document understanding

Hi all,

I have a case study where different-different PDF samples. On that PDFs I want to extract data based on keywords.
Suppose I want to extract date fields here but there is multiple date keywords like Date, Payment Date, Paid Date. Same for Invoices field there is a multiple keywords like Invoice #, Invoice Number, Invoice No, Reference, Document No., Supplier Inv.
I collected nearly 30 PDF sample. I want to build one taxonomy which will work for all 30 PDFs. Please guide how can I do this?


Document Understanding - Introduction

Refer this for basic understanding on DU.

Sign up on academy and there are courses which would guide you with samples.



You can use prebuilt models if it is invoice…explore documnet understanding invoice models


Thanks @Anil_G for sharing documents.

Actually I need to know how I will configure Date field on taxonomy. Means each PDF contains different -different Date name. for example
Payment Date-----------Invoice2
Paid Date--------------Invoice3

These above I wants under Date column. I want to build on taxonomy for all the above invoices.


thats the difference…in taxonomy you would define what columns you need in output but not how or what field it links in documents

that is done in ai center and configure extractor
