We are working on document understanding and our input are multiple invoices which are in pdf format and with the same structure. Each pdf has a transaction table which we need to extract the data every pdf transaction table has different line items some one has five line items some one has 10. So, when we are creating the common template with the maximum number of line items and extracting it but it is not working properly if we receive only 5 line items in the table.
Do we need to create a individual template for each template, kindly provide the solution. We are using Form extractor to extract the data.
Structure of the table are remaining same but difference is in line items if i create a template with 5 line items in a new template i received like 10 line items at that point bot is not able to find the table.
Yeah If the line items are not following the proper structure for every PDF then the bot will extract unwanted fields.
If you make a template having 10 line items but the processing PDF have only 5 line items then Bot will extract the unwanted fields from the PDF
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:
2:00 GitHub free code for all the files
2:20 Logic of general workflow
4:40 File 1 simple PDF
9:50 File 2 PDF with a column with multiple lines
20:10 File 3 PDF with a column with multiple words ON the LAST column
27:00 File 5 PDF with a column with multiple words ON inside column (2 columns)
31:40 File 6 PDF with a column with multiple lines
39:10 File 8 simple PDF
42:15 File 9 PDF with multiple spaces on that need to be correct
45:50 File 10 PDF with multiple columns that have multiple lines + multiple pages
55:50 File 11 simple PDF with protection empty Cells
58:35 File 12 Big PDF with an empty line and Empty columns and partial total
1:02:25 File 13 PDF with multiple columns that have multiple words and hard to define a rule
1:10:15 File 15 PDF with multiple columns that have multiple lines
1:12:50 File 17 simple PDF remove spaces from headers also remove space from Data
1:16:05 File 18 simple PDF
1:17:10 File 19 PDF with multiple pages and columns with multiple lines
1:22:10 File 20 PDF with multiple columns that have multiple lines
1:25:00 File 21 PDF with empty columns and subtotal