Unable to read PDF Files

pdf
ocr

#1

Hi all,

I have a workflow with the following structure:

  1. open an Editable PDF filled with an individual’s information
  2. read the information filled by the individual in the Editable PDF
  3. output the information read into different variables for later processing

It should be easily done but I’m having trouble getting UiPath to read the PDF. I’ve tried screen scraping, get text, and the anchor base funcionality with no luck.

This is the pdf file in question if it helps:

Any help will be most welcome.
Thanks


Gettext from PDF without opening the file
#2

Hi,

Thanks for providing the PDF. I have tried this using both PDF and PDF with OCR and both work pretty well compared with the usual success rate of OCR. Please see the attached file and let me know if you have any questions. Please note the file was not open when I ran this.

ReadPDF.xaml (6.2 KB)

Richard


#3

Thank you for your help :slight_smile:


#4

Hi @richarddenton - Is there a way to extract only single fields, not entire OCR, without opening the file?

The PDFs I’m using, like the user’s examples, have editable text fields and aren’t images. I would like to use get text, but it doesn’t work without opening the files.


#5

I know this is a super late answer, but I will answer it anyway in case any future users have the same problem:
As instructed in UiPath Academy, by using PDF activities (Read PDF, Read PDF with OCR) you can read the file without opening it. However the output will be the whole file.

If you only want to scrape a single field - a part of it, there is no other way but to open the PDF file.


#6

Hi @richarddenton, I am having the same issue as @gonmartins, I downloaded and run your solution but in my case I still have the same problems:
The Read PDF Text activity scrapes all text in the PDF, except the information in the editable fields.
The Read PDF with OCR activity does the same, but the information in the editable fields comes out garbled.
I’ve tried using opening the file and using Anchor base to extract field by field, and it works for certain fields, not for others and when I run it again the fields that previously worked don’t work anymore and vice-versa.
Any ideas? I feel pretty much frustrated. Using the latest Studio release 2018.2.2
Thanks


#7

The Read PDF Text activity scrapes all text in the PDF, except the information in the editable fields.
The Read PDF with OCR activity does the same, but the information in the editable fields comes out garbled.
I’ve tried using opening the file and using Anchor base to extract field by field, and it works for certain fields, not for others and when I run it again the fields that previously worked don’t work anymore and vice-versa.
Any ideas? I feel pretty much frustrated. Using the latest Studio release 2018.2.2
Thanks

Did you ever resolve this? Running into this issue currently.


#8

Hi @LeHung, how to open PDF file?

I want to extract only First Name and Last Name. Can u help whether this is the correct steps?

  1. Read PDF
  2. Data Scraping
  3. Write Text

Sorry, Im newbies…still confusing


#9

You should sign-up to the Academy to learn the basics first @Zati - Here’s a brief tutorial / overview of the process


#10