Please check below snip and full pdf attachment for reference, which has two tables side by side. This format pdf is not extracting line items properly. both table’s 1st line is considered as single line. similarly 2,3,4,5,… lines.
Template-19.pdf (152.3 KB)
If both the tables are getting read into same table and if all data is read…then you can copy the datatable into another and use remove data column to remove non required columns…which should give you both in separate tables
Hope this helps
Cheers
Hi @charantej ,
Could you let us know if you have performed Data labelling for these documents and then trained the model ?
When Data Labelling, we should be able to Select/Label each of the rows in the tables (both) separately. Marking it as separate rows if that is how you would want to fetch the data.
Let us know what are the steps that you have taken upto now. Are you using the DU Model or any Pre-Trained model ?
I’m using custom DU model. I have labelled, trained model and using it. I have labelled around 35 pdfs with all different formats. Right now I have only these files. Only in 2 pdfs we have adjacent tables. I have labelled adjacent table rows separately only.