Simulate Excel > Data tab > Get Data > From PDF

Hello,

I’m trying to extract tables from a PDF file. The file can contain multiple tables per page. All of the text on the pdf is machine readable.

My experiments:

  • I tried extracting text from the pdf, but that method produces ambiguous cases, such as fields are separated by space but some numbers have a space in them, so it’s, afaik, impossible to confidently decide how to format the text.
  • Tried using python library pdfplumber, with better results than trying to extract table from plaintext, but still ran into some issues that I have yet to solve.
  • tried Extract table data from multiple pages of pdf, which seems to only work for a use case where the table is always located in the same place
  • and finally, what works is using the Get Data excel functionality, shown in picture below. However, I don’t know how I can integrate this with UiPath.

Is there any UiPath activity for this?

(I know that I could screen record me doing this by hand and let UiPath reproduce these steps but I’d like a more elegant solution)

Thank you!

Hi @Richard_Kraus ,

Can u drop a sample pdf ,for the trail and error method or just to see how things are aligned in pdf

@Richard_Kraus

  1. Ideally this is a case for document understanding. If this license is not available then go for option 2
  2. Instead of recording from UiPath…user the macro recorder in excel …it would be under developer tab …if not available go to more commands and enable it…then a macro can be generated which can be used in UiPath execute macro and load any new file into excel

Cheers