I have a pdf data table which i’m able to fetch with regex and then converted as a string. I would like to keep the formating with a soultion as simple as possible.
Is there any package that can read a whole table from PDF so i can add it to a data table?
What would be the best way to solve this? I have tried to search for other solutions but seems like it’s a tricky since PDF activity will store everything as a text string.
Please find attached file as an example of the PDF.
Thanks in advance!
In this video, I extract tables from PDF and write data in Excel:
0:25 Install PDF Activities
1:10 READ PDF text, Get PDF page count, Extract PDF
5:40 Read PDF with OCR
6:55 Join PDF and Manage PDF passwords
9:30 Extract Images From PDF and Export PDF as Image
12:00 Extract table from PDF use-cases 1 replace some spaces with | (one column has multiple words)
24:00 Run the robot to see the result
25:40 Extract Table from other PDF use-cases 2 delimiter is 2*spaces " " easy split
31:50 Extract Table from complex PDF use-cases 3 unstructured data the logic will be based on IsUpper and IsLower
40:25 Extract the price value from PDF