Screen Resolution issues for PDF data extraction


#1

I am much aware of extracting PDF content in a text file/storing in a variable using activities available in UIPath (Read PDF text/with OCR), while I use this "Read PDF with OCR "activity for a simple readable PDF (content not as an image blocks) resulting with an error “Scrape returned empty text” or sometimes nothing will be extracted if the selected text doesn’t fit the screen or hiding behind/down. How do I solve this issue ?


#2

is this happening with a specific pdf file or with any? can you try with a simple pdf created by you?
are you up to date with the pdf activities pack? and with Studio version also?


#3

@Dominic, Me also facing the same kind of issues while using ‘Get OCR Text’ activity to scrape particular data from the PDF file when that particular data is non visible.Also, It sometimes scraping wrong data while changing the screen resolution.Means that PDF pixel position is not adjusting as per the screen resolution adjustment.Kindly let me know how to solve this issue?


#4

Hi Changeponder_Tester,

Earlier, I wasn’t aware of other workarounds like Read PDF with text and making a string manipulation and so on. Finally I have made it to work with the help of some activities like Anchor base, Find Relative Element. Also do note that we proceed further with an assumption of fixed screen resolution.


#5

Hi Dominic, Fixed resolution is working fine, Moreover i am expecting Permanent solution to work with any resolution.Actually i have used Get OCR Text activity with ‘Google OCR engine’ i. It is scraping the accurate data when corresponding data is visible and screen resolution is not changed,Else it throws error like ‘Scrape returned empty text’ or returning the wrong value. Is there any best way to scrape from PDF? Also, is there any best OCR engine to scrape data from the PDF when corresponding data is non visible?


#6

Are you scraping from PDF directly or using PDF activities?


#7

@Gabriel_Tatu, I am scraping data from the PDF directly using anchor base with Get OCR text activity. Because my PDF is scanned format,So, i can’t Use Find Element,etc…I can use only OCR and Image based activities alone for getting particular data.Suppose that particular data is non visible then getting pbm. Means that particular data is present in the current page alone but have to scroll for visible it.Is there solution for this one?


#8

Yes, use the activities :slight_smile:


#9

I have solved this one using Read pdf text and Read pdf with OCR text activity. Actually scraped all the data’s and then applied string manipulation techniques to get necessary data.


#10

Hello guys,

I am actually having issues with extracting accurate data using ‘get text’ from PDF

my PDFs are not downloaded into my system they popup when I click View and from there I extract data, sometimes I get accurate data and sometimes its not I just get wrong invoice amount.

I cant use Read PDF as PDFs are not downloaded, and PDF is not image based doc so no OCR. I did read about string manipulation technique from couple posts can you example me how this works with ‘Get Text’ I am a beginner at UiPath.

Thanks,
Sri