Simulate Excel > Data tab > Get Data > From PDF

Richard_Kraus · July 10, 2024, 3:52pm

Hello,

I’m trying to extract tables from a PDF file. The file can contain multiple tables per page. All of the text on the pdf is machine readable.

My experiments:

I tried extracting text from the pdf, but that method produces ambiguous cases, such as fields are separated by space but some numbers have a space in them, so it’s, afaik, impossible to confidently decide how to format the text.
Tried using python library pdfplumber, with better results than trying to extract table from plaintext, but still ran into some issues that I have yet to solve.
tried Extract table data from multiple pages of pdf, which seems to only work for a use case where the table is always located in the same place
and finally, what works is using the Get Data excel functionality, shown in picture below. However, I don’t know how I can integrate this with UiPath.

Is there any UiPath activity for this?

(I know that I could screen record me doing this by hand and let UiPath reproduce these steps but I’d like a more elegant solution)

Thank you!

sandyarpa767 · July 10, 2024, 5:47pm

Can u drop a sample pdf ,for the trail and error method or just to see how things are aligned in pdf

Anil_G · July 11, 2024, 1:55am

Ideally this is a case for document understanding. If this license is not available then go for option 2
Instead of recording from UiPath…user the macro recorder in excel …it would be under developer tab …if not available go to more commands and enable it…then a macro can be generated which can be used in UiPath execute macro and load any new file into excel

Cheers

Topic		Replies	Views
UiPath PDF Structured Table Extraction Help pdf , activities , data_scraping , question	1	953	January 6, 2020
Extract table from pdf as it is Activities pdf , studio	15	8054	March 4, 2024
Unable to extract table data from pdf file Studio studio , question , tools	4	1287	October 10, 2022
How to get data from PDF and EXCEL data in a data table Activities datatable , excel , uiautomation , activities , question , pdf-extraction	1	417	June 7, 2023
Exctract table from pdf to excel Studio	3	2536	February 28, 2021