Greetings community, i would like your input in a project i am trying to do.
So, i have some invoices(pdf) that are multiple pages.
All of them have a summary in the first page (Some information and a small table 2,3 rows with the categories of the expenses and the amounts as well as a total amount after the table).
The rest of the pages are those categories in details (2,3 lines of text at the top and then a table with expenses 1 by 1).
The problem is with the detailed pages because the tables are not clearly written, there are no headers, no lines to separate the rows just spacing and the ML extractor seems not be able to identify them correctly.
In the template setup the ids that come up to match with the taxonomy fields are very wrong.
I cannot use a form extractor because the placement of the table depends on its size…
The pdf is separated into its pages and i made 2 taxonomy types for the 1st page and the rest.
Do i just need more invoices to train the ML or have i approached this wrong?
Any input is welcome , thanks!