I have an untagged pdf which contains a fairly standard business form that contains a table (a row of headers with rows of data). However, as it is untagged “Indicate element” can only see the whole page. What is the best approach for extracting the table? I have looked at converting the pdf to tagged, but that isn’t possible. Anchor base doesn’t have an obvious anchor because its a set of rows under headings. Do I need to down the Document Understanding route or is there another simpler way?
A problem with reliable delimiters makes the string manipulation unworkable. Easiest solution at the moment looks like dropping out of UiPath, use Adobe Pro to do a conversion and reingest as an Excel sheet. Not elegant, but will do the job. It may be a stimulus to try out the Document Understanding framework. A sledgehammer to crack a nut but useful learning opportunity.
I tried using the string manipulations to extract the table data from pdf using Read PDF Activity.
It’s hard to mention all the steps i did to achieve that but you can take a look at this video
Very good. A string manipulation method to create a table out of a pdf image.
I am probably going to use the Adobe method or possibly the Document Understanding method as my first choice, but if for some reason these don’t work out at least there is a method!