Pdf Data (Tables) Extraction to Excel

Hai All,

We are trying to develop a PoC which is bot validation of balance sheet. Which is of type PDF (standard or scanned) consists of around 50 pages.

This PDF contains text, tables with financial values. We need to exclude the text and validate the information from tables only.

For ex: Inventories has a consolidated value X in page 3 and details spread in page 10 or 12, we need to make sure that the consolidated value is correct if we calculate details.

What we have done:

  • We tried to read the whole PDF Using ReadFullPdf Activity, but getting difficulties in segregating values
  • Tried to convert PDF to excel, we used one website called smallpdf.com but the values came out was not accurate. We tried to use one library from UiPath (SautinSoft) but only we got the table nothing else.

Want to understand if there are any ways to develop this bot. Are there any re usable components/API’s/activities available or any suggestions.

Sample Image of Pdf Data what we need to extract shown below.