Scanned PDF files

Ramalingaiah · May 8, 2019, 9:28pm

Dear Team,
How to read Scanned PDF. I Have Scanned PDF from which I have to extract Some data like Company Name,PO Number,Invoice Number,Table of PDF,. Google OCR Is not capturing well .
Microsoft OCR is showing “Capture Error”.
Is it possible Uipath tools?.

Thank you
Ram

Palaniyappan · May 8, 2019, 11:03pm

Buddy @Ramalingaiah

You can extract the data from scanned pdf, but I wonder why ocr didnt… because it would work for sure buddy…
No worries buddy
For your case you can try with scrape relative activity keeping the company name, po number and other fields as relative clipping region and get the text with ocr scrapping…you can get this activity by clicking alt+ctrl+d…were desktop recording wizard appears…in which you can find the scrape relative under text option…

That would work buddy.
Cheers

anil5 · May 9, 2019, 4:06am

Hi @Ramalingaiah,

Use computer vision activities to extract values from scanned PDF’s as OCR will not give exact output.

Refer to the below post on how to install computer vision activities.

Ramalingaiah · May 9, 2019, 4:38am

Dear Palaniyappan/anil5,

thank Palaniyappan/ anil5 for replaying ,

Regards
Ram

Ramalingaiah · May 13, 2019, 11:19am

Hi Palaniyappan/ Anil5,

i need content read for the png format file and is it possible in the Uipath? .
i have attached screenshot ,

than please help out me .

Thank you
Ram

Palaniyappan · May 13, 2019, 11:27am

@Ramalingaiah
Yes Buddy you can read that with read pdf ocr activity buddy @Ramalingaiah and the output would be of type text

Ramalingaiah · May 13, 2019, 11:40am

Hi Palaniyappan,
thank you for replying
i need content read for the png format file and is it possible in the Uipath? .
i have attached screenshot ,
read text Telphone Number and Names And Invaice No below given examples
Tel:614-489-8316,
Name: Josepb j Gatto,
Invaice no: OH43017,
please give me suggestion and any link
Thank you

Palaniyappan · May 13, 2019, 11:57am

Buddy @Ramalingaiah

If its a image then you can do the following

use start proces to open the image and send the filepath as input
once opened you can use screen scrapping with ocr and you have that option in design menu with screen scrapping.
scrape the portion that you want to scrape and increase the scale of ocr google to more than 5 or 6 or untill the text is obtained correctly, you can do that by just increasing the scale in scrape wizard and refresh them

image729×96 11 KB

Once after clicking this scrapping the region you want
you will be getting like this

image882×637 9.82 KB
Once you get the output from scrapping as output variable named out_text
Use a split method to convert each line into array so that you can get each terms you want like with assign activity
out_text_split_array =out_text.split(Environment.Newline.ToArray(),StringSplitOptions.RemoveEmptyEntries) where out_text_split_array is of type string [ ],
then to get tel value and name and invoice you assign activity like
out_tel_value = split(out_text_split_array,“:”)(1).ToString
out_Name_value = split(out_text_split_array,“:”)(1).ToString
out_invoice_value = split(out_text_split_array,“:”)(1).ToString

Thats all buddy you are all done…
Kindly let know this works or not @Ramalingaiah

Cheers

Palaniyappan · May 13, 2019, 12:11pm

Did that work buddy @Ramalingaiah

Topic		Replies	Views
How to rad invoice number from scanned PDF Help studio	10	2212	November 7, 2019
How to read Scanned PDF Help	3	5884	April 27, 2017
Read different scanned invoice Help	8	2802	March 31, 2020
Unable to extract specific data from scanned pdf Help pdf , activities , question	6	1098	January 24, 2020
Scan Pdf Document Extraction Academy Feedback	3	1212	August 25, 2020

Scanned PDF files

Related topics