How to extract a table from an untagged pdf file

I have an untagged pdf which contains a fairly standard business form that contains a table (a row of headers with rows of data). However, as it is untagged “Indicate element” can only see the whole page. What is the best approach for extracting the table? I have looked at converting the pdf to tagged, but that isn’t possible. Anchor base doesn’t have an obvious anchor because its a set of rows under headings. Do I need to down the Document Understanding route or is there another simpler way?

All help gratefully received

You could try reading the pdf as a string and use string manipulation to convert it back into a datatable?

Seems abit tedious but a potential solution.

A problem with reliable delimiters makes the string manipulation unworkable. Easiest solution at the moment looks like dropping out of UiPath, use Adobe Pro to do a conversion and reingest as an Excel sheet. Not elegant, but will do the job. It may be a stimulus to try out the Document Understanding framework. A sledgehammer to crack a nut but useful learning opportunity.

Thanks for the suggestion.

1 Like

Hi @ChrisC

I tried using the string manipulations to extract the table data from pdf using Read PDF Activity.
It’s hard to mention all the steps i did to achieve that but you can take a look at this video

Hope this helps!
Regards,
Aditya

Thanks I will have a look.

1 Like

Very good. A string manipulation method to create a table out of a pdf image.

I am probably going to use the Adobe method or possibly the Document Understanding method as my first choice, but if for some reason these don’t work out at least there is a method!

Many thanks.

1 Like

Hi @ChrisC

Thank you!
Happy Automation!
Regards,
Aditya