Hello,
I’m trying to extract tables from a PDF file. The file can contain multiple tables per page. All of the text on the pdf is machine readable.
My experiments:
- I tried extracting text from the pdf, but that method produces ambiguous cases, such as fields are separated by space but some numbers have a space in them, so it’s, afaik, impossible to confidently decide how to format the text.
- Tried using python library pdfplumber, with better results than trying to extract table from plaintext, but still ran into some issues that I have yet to solve.
- tried Extract table data from multiple pages of pdf, which seems to only work for a use case where the table is always located in the same place
- and finally, what works is using the Get Data excel functionality, shown in picture below. However, I don’t know how I can integrate this with UiPath.
Is there any UiPath activity for this?
(I know that I could screen record me doing this by hand and let UiPath reproduce these steps but I’d like a more elegant solution)
Thank you!