PDF Invoice data Extraction Problem

Hi Everyone,
I’m Facing an issue for extracting table from PDF Invoice. It’s a structured data. When i try to extract Table data it’s coming wrongly because i’m getting different table data format everytime. Can anyone help me how to solve this.
I’m attaching my input Files and out put also. Please help me to solve this.

Books invoice (1).pdf (108.2 KB)
Sandals invoice.pdf (55.0 KB)
Bag invoice main (1).pdf (169.7 KB)

Output Excel:
Amazon Template.xlsx (9.8 KB)

@prasath17 Please have a look and help me to solve this.

1 Like

Can you mark the fields in the PDF which are to be extracted.

Hi @kumar.varun2 i need whole table in that pdf

@kumar.varun2 i need whole Table details, Shipping and Billing Address also.

@ranaprathap928 -f the table data is single(without wrapping) we can try with Regex but extracting multiline data is very difficult to impossible…i

@prasath17 multiple i don’t get max , Single items i will get max like which i attached right BAG and Sandle invoice those type i will get. Is it possible to make flow for those 2 invoices.

@ranaprathap928 - So you don’t need second row in the table where it has 9% SGST and 117.38???

Again, another problem field is description (Multiline)…

@prasath17 i don’t get max 2 items, only 1 item i will get like bag and sandle invoice type. skip this file and check with rest 2 files , is there any possibility means.

@prasath17 yes single line is enough

@prasath17 In Description i need only Code which is in bracket or First line also ok

@ranaprathap928 - will give it a try today and if I get the result I will share the xaml with you.

Sure @prasath17 Thanks

@ranaprathap928 - Please check this…
Regex_RR.zip (232.3 KB)

It’s very complex to get the other fields like Name and address because there are other contents printing on the same line…

Hi @prasath17 This code is working i’m getting expected output but few files it is throwing error, If u have time please check once.

@prasath17 is there any possibility to get Description also . First line also ok in that Description.

@ranaprathap928 - There should be consistency across the files in order to Regex to work…if not building the Regex including all the variations is time consuming to impossible…

For Ex: in the below pdf, Discount column is missing in the table…that could be reason for the error…

Ok @prasath17 thanks for your help, it is working for other files.

1 Like

@prasath17 will Document Understanding be useful for these type of problems?

hi @kumar.varun2 - yes definitely/Case by Case basis. But that comes with the price, i.e. you need to pay for the License to use DU.

1 Like

@ranaprathap928 - Here you go
Regex_RR_Updated.zip (422.8 KB)

I have updated code, so that it will work for both Discount and No Discount pdfs…I have already ran the code against 6 pdfs with no errors…

please review and let me know…