Hi all, I need to extract a table from a PDF, the problem is that I can’t use Document Understanding as the PDF is very large and it is not easy with regex either.
Here is the table:

As you can see, the data contains spaces, does not always contain all the fields (Unidades, valor base, devengado, a deducir) and also some are in two lines as is the case of “(04-08-21 0:45:00)” which is part of the previous line.

With Read PDF I have extracted those lines from the table but as they have spaces, sometimes missing numbers and can be in two lines I am a bit lost.

Can you help me?

Thank you very much


Did we try with custom component on converting a pdf to excel file

Cheers @AdryGL.96


I just tried it and yes, it passes the document to excel but the data are still as in the PDF, in a single column all the data.