Other ways to get a table from PDF aside from Reading Text

Hi! So I have this problem with the pdfs that I’m reading. Let’s say I have this kind of table per pdf file

Code Date Amount1 Currency Amount2
BD
06/05/2023 500 456.20
2,850 6,987.23
AL 03/02/2023 782 USD

As you can see, not all rows are not loaded with a full value.
I tried reading the pdf to txt and to iterate each row by line, this is how it ended up

BD
06/05/2023 500 456.20
2,850 6,1987.23
AL 03/02/2023 782 USD

After that, I tried getting each value each line by splitting it with spaces(Output: AL,03/02/2023,782, USD).
Problem is, I can’t loaded them correctly to their corresponding column, since the arrays that was split by space doesn’t have the same array size.

Is there another way of getting a table from a PDF to be stored in datable? My endpoint here is to correctly put this table in an excel file.
I’ve tried Data Scraping the PDF File but it can’t recognize the table since it sees it as an image.
I’ve also tried saving it as a word then I would copy the table from word to Excel(clipboard), but it is always an image and it didn’t read all the rows.

Hi @Shinjid , you can use UiPath Document Understanding to read tables from PDF or Images. Attaching the academy link to Doc Understanding course below:

1 Like

Hi @Shinjid,

Document understanding is one way to read the pdf data.

One another way is using Python code. You can use tabulla package of python. It will return table in pdf in CSV. Which you can read as datatable.

1 Like