How to extract form values or editable text from PDF files?

deepesh.maskara19 · October 31, 2018, 3:55pm

I want to extract some editable text or like the values for certain fields in a PDF file but unable to extract that using-
Read PDF Text
Read PDF Text with OCR
Get Text
Get Text with OCR
Anchor Base
Screen Scrapping
and others
Even though i am getting the editable values from the “Read PDF with OCR” activity, the data is not in structured format moreover the format for fetched data varies for different PDFs which are originally in same format. Please let me know if any other information is needed.
Thanks in advance

Sat · October 31, 2018, 4:15pm

Here, you are extracting whole data from PDF? or some values from PDF?

sangasangasanga · November 21, 2018, 8:43am

Hi, I am facing the same issue. Any suggestions?

Rishi1 · November 21, 2018, 9:21am

Hi @deepesh.maskara19 @sangasangasanga
it is seen that most of the time extracting data from pdf doesn’t work properly because different pdf has different format . The pdf that UiPath used in their demo is properly structured means all the selectors are easily identified using screen scrapping and if we use ocr that also work . but in real scenario it doesn’t work like that , so to handle this type of scenarios we can use free ocr api https://ocr.space/ , first check by going on this link how output is coming for your pdf , if it looks good then you can register and use their api , if you want to know how to use their api then use below link there I have given workflow
Unable to capture PDF Invoice information using OCR - #29 by Tom1989

If still you are not getting correct output then you can use python scripts to get the data from pdf

Topic		Replies	Views
Pdf extraction and fill application Help	6	890	October 2, 2019
Unable to read PDF Files Help pdf , ocr , activities	9	7102	October 29, 2018
Extract data from pdf document Help pdf , activities , question	18	1858	February 3, 2020
Text Extraction for PDF File Studio	4	1621	July 16, 2020
How to extract and validate data from PDF files Help pdf , activities , data_scraping , question	16	3686	November 23, 2019

Most Active Users - Yesterday
ashokkarale
Anil_G
Yoichi
Kismet_Tosun
aravindbalineni123
lrtetala
postwick
Manisha24
Melroy_Dsa
SorenB
More details...

How to extract form values or editable text from PDF files?

Related Topics