MultiPage Issue with Document Understanding

8SEVEN · October 25, 2023, 3:00am

Hello Everyone,

On daily basis I get PDFs, where some has 1 page and some has 2 pages and I need to extract the total amount from that PDF, now if the pdf has too many list of orders then the total amount falls in second page but if the list of orders are less then the total amount is on the first page.

I extracted when the orders are many and the total amount is on Second Page, but when I debugged with less orders its not capturing the total amount.

Please suggest on how to tackle this issue.

Nguyen_Van_Luong1 · October 25, 2023, 3:51am

Hi @8SEVEN ,
I think we can extract all PDF to get all text then split to get ‘total amount’
eg: regex
→ to detail, can you share your sample file?
regards,

Palaniyappan · October 25, 2023, 3:55am

Set up a Document Understanding project in UiPath with the “Form Extraction” or “Intelligent Form Extraction” scope, depending on the complexity of your documents.

And you can achieve this by document classification
Like

PDF Classification
– Classify Document (Classify as “First Page” or “Second Page”)

Data Extraction
– If Classified as “First Page”:
– Extract Total Amount from First Page
– If Classified as “Second Page”:
– Go to Second Page using “Next Page” activity
– Extract Total Amount from Second Page

Define your document types and train the model to understand the structure and layout of your PDFs. Make sure the model recognizes key fields like “Total Amount.”

Cheers @8SEVEN

8SEVEN · October 27, 2023, 1:19pm

Hello @Palaniyappan

Thank you for the help.

But now when I debug this its chooses the first document type ,rather I have followed the steps and all these in the below image are different type of documents of same format , just that some has 1 entry and some 2,3,4, and so on.

But when I debug it still chooses the first document type rather than checking all the templates that I have selected for these document types.

In Data extraction , if I enter specific Document type id then it works but when I delete that and enter classification result as classifyoutput(0) then it chooses the first document type , please advise on how to handle this scenario.

Thanks

AMAN_GUPTA · October 28, 2023, 1:27pm

Hi @8SEVEN ,
If you are working with invoices, purchase orders or receipts then you can directly use public endpoints listed here in your DU project. Otherwise, you can also train your custom model through the AI center with adequate samples of documents. This is you no need to worry whether total is coming on first or second.

Topic		Replies	Views
More then 2 pages document understanding Activities activities , question , document_understanding	2	462	October 7, 2023
Having Issues Extracting Data from semi structured pdf Document Understanding	2	1066	June 21, 2020
Form Extractor Issue with Multi-Page PDFs in UiPath Document Understanding Studio studio , question , activities_panel	1	39	November 6, 2024
Document understanding with UiPath MULTIPLE PAGES Document Understanding datatable , excel , uiautomation , forum , question	5	1911	November 21, 2022
Need help in PDF extraction using Document Understanding Document Understanding pdf , activities , studio , question , document_understanding , pdf-extraction	4	892	November 21, 2022

MultiPage Issue with Document Understanding

Related topics