Need help in PDF extraction using Document Understanding

Hi Everyone, I have a pdf file, That pdf is merged with multiple invoices. I want to extract data from each invoice 1 by 1. I want to solve this by using document understanding . With my knowledge i have tried with form extractor it is not working. Can anyone guide me how to extract data from those invoices.
I’m attaching my pdf below for reference.

Required Output:

Invoice 1:
Invoice number, invoice date, Account number, Table, Sub total, tax, Discount, Grand total.

Invoice 2:
First name, surname,Member no, Dob, Main member - Name, Liberty health provider no, Admission date, Discharge date, Table, Cost

invoice 3:
Claim number, Personal health number, Name , Dob, Mail, City, Postal code, date of Accident, Table

111900492434ImagePDF-merged.pdf (2.9 MB)

Hi @Learner007

I have seen your pdf file , I assume that all the invoices are single page only, so you have to split this merged pdf based on each page. Extract the splitted pdf one by one using document understanding and append the extracted value

Regards
Robin

Hi @Robinnavinraj_S in sample pdf i have single invoice , there are some cases that i will get multiple pages. Is there any option to identify that and split my invoices.

1 Like

Hello,

You can easily split the PDF before you digitize and extract so your PDF gets split into individual invoices. You can easily do this by adding pdf activities. Here’s a tutorial on how you can do either single page or dynamic page.

Here’s how you can split a PDF into dynamic ranges, you can follow this tutorial :