Extract Financial Statement from PDFs Folder into Excel

Hi,

I would like to extract the income statement from PDFs of a folder and output it to Excel as a table (not an image).

However, the financial statement format are slightly different for different PDFs. The read text activity works but it seems to be messy when insert to Excel.

What activities can I use to do with this? Can anyone can screenshot the workflow? Thanks.

is it possible to upload the pdf so that we can test it and give you the solution?

Otherwise, I would recommend you to use table extraction if it is a table in the pdf

or you have to build a table and add required data into the table via looping

Horizon Robotics - W.pdf (8.3 MB)

CONSOLIDATED STATEMENTS OF PROFIT OR LOSS on Page 506. Thanks.

Hie @cclemon if the document structure are same for all the pdfs you can go through with the Document Understanding method. it more reliable and faster way to Extract Data From the PDF
cheers Happy Automation.

If you cant use read pdf text activity and loop through these text, maybe you can try below:

  1. Use “Extract PDF Page Range” to a single pdf files
  2. Use “Use Application” to open the single pdf files
  3. Use “Get Text” activity to get single rows via using specific elements and use * for numbers like below
  4. Repeat step 3 until you get all text

Below is a screen shot for your reference

Hi @cclemon

Can you share example output data which you want fill in excel format?

In 506 page which data you want

image

Regards,
Gowtham K

Hi, I think will be the whole table extract to excel.

@cclemon can you share the example how you want?

Sample.xlsx (11.6 KB)
thanks

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.