How to extract form values or editable text from PDF files?

I want to extract some editable text or like the values for certain fields in a PDF file but unable to extract that using-
Read PDF Text
Read PDF Text with OCR
Get Text
Get Text with OCR
Anchor Base
Screen Scrapping
and others
Even though i am getting the editable values from the “Read PDF with OCR” activity, the data is not in structured format moreover the format for fetched data varies for different PDFs which are originally in same format. Please let me know if any other information is needed.
Thanks in advance

Here, you are extracting whole data from PDF? or some values from PDF?

Hi, I am facing the same issue. Any suggestions?

Hi @deepesh.maskara19 @sangasangasanga
it is seen that most of the time extracting data from pdf doesn’t work properly because different pdf has different format . The pdf that UiPath used in their demo is properly structured means all the selectors are easily identified using screen scrapping and if we use ocr that also work . but in real scenario it doesn’t work like that , so to handle this type of scenarios we can use free ocr api https://ocr.space/ , first check by going on this link how output is coming for your pdf , if it looks good then you can register and use their api , if you want to know how to use their api then use below link there I have given workflow
Unable to capture PDF Invoice information using OCR - #29 by Tom1989

If still you are not getting correct output then you can use python scripts to get the data from pdf

1 Like