I would like to raise a query regarding the Document Understanding process

Is it possible to extract 100 to 120 data fields from a PDF file containing 6 to 7 pages using UiPath Document Understanding? If yes, what would be the most suitable approach or extractor to handle such a large number of fields effectively?

@ashokkarale , @Anil_G

@Maulik_Khunt1

If documents comes with same structure every time then you can use form extractor or Intelligent keyword extractor

if document has variations go with ML extractor

1 Like

Thank You @Darshan_Sable

I am currently using the Community Edition of UiPath and have been working with both the Form Extractor and Intelligent Keyword Extractor. However, I am encountering an issue while attempting to extract between 100 to 120 fields from a document.

Despite multiple attempts, the process fails with the following error message:
“Document was not processed.”

@Maulik_Khunt1

The extractor type depends on the file being structured ,unstructured ,scanned or digital..form extractor and keyword extractors are for more structured docs where you have keywords and defined areas and they dont change..but to verify try with a smaller set and based on that you can change..but try to get the toughest values that you feel on doc to understand if the approach works

Cheers

1 Like

Yes it is possible

For such a large and complex document, you should combine multiple extractors for best results:

Step 1: Use Document Classification to detect document type (if needed).

Step 2: Apply Form Extractor if the document is structured (e.g., tax forms, applications).

Upload or build a Form Template in the DU Taxonomy Manager.

Carefully define all 100+ fields in the Taxonomy.

Step 3: Use ML Extractor for semi-structured documents where forms are not rigid.

UiPath offers pre-trained models or you can train a custom model using the AI Center.

Step 4: Add Regex Extractor as backup to pick specific fields (like phone numbers, emails, dates).

Step 5: Use the Validation Station to review extraction accuracy (especially important for so many fields).