How to speed up the OCR scanning progress?

MichaelC · May 17, 2017, 9:23am

dear, I am using Read PDF with OCR function, however, even I reduced the scale from “1” to “0.75”, the processing time is still too long, say 3 min for 4 PDF pgs.

Is there any ways to speed up the scanning process? say, only Read PDF for first 2 pgs? or select a specific area from PDF, say bottom part of PDF, in order to reduce the time for scanning the whole PDF?

thanks~

vvaidya · May 17, 2017, 12:44pm

Input
Range

The range of pages that you want to read. If the range isn’t specified, the whole file is read. You can specify a single page (e.g. “7”) or a range of pages (e.g. “2-9”) to be read. Only string variables and strings are supported. The default value is “All”.
Note: Strings must be placed between quotation marks.

MichaelC · May 18, 2017, 12:59am

thanks @vvaidya~

if I would like to specify page 1 and 4, how should I express it? “1” + “4”?

further, if I would like to read OCR in PDF, say:
Date of Issue: “22 Nov 2016”, shall I use “set focus” to highlight the “Date of Issue” and how can I read “22 Nov 2016” in UIPath?

Further, may I know how to identify a PDF, say when PDF pages more than 5 pages.? coz usually my PDF only contain less than 4 pages, so how can I express in UIPAth that if more than 5 pages, this is special PDF?

much appreciated and thanks~

vvaidya · May 18, 2017, 6:47pm

@MichaelC Probably you can perform this using loop.

vvaidya · May 18, 2017, 6:59pm

I’m not sure if PDF activies has this functionality, but below code should work
pdfPages.xaml (6.3 KB)

vvaidya · May 18, 2017, 7:03pm

To regad date of issue, once you have OCR’ed PDF. below regex will return the date.

Issue:(.{11}) —> User matches activity

MichaelC · May 19, 2017, 1:35am

thank you~

MichaelC · May 22, 2017, 3:12am

dear, what is the expression if I want to extract 20 Oct 2016 from
“Dateofissue: 2 0 OCT 20 1 6”, where only Dateofissue will be fixed for all PDF,
I hv tried this expression, but cannot extract 20 Oct 2016
“Advice.Substring(advice.IndexOf(“Dateofissue:”),advice.IndexOf(“Dateofissue:”)-Advice.IndexOf(“Dateofissue”)+12)”

much appreciated and thanks~

vvaidya · May 22, 2017, 4:20pm

This is more easier i feel*

strDate = Split(Advice,":")(1)

strDate = Regex.Replace(strDate , "(?<=\d)\p{Zs}(?=\d)", "") —> Removes space between numbers

Topic		Replies	Views
Read PDF with OCR only reads couple of pages or last page Activities pdf , activities , question	6	2295	July 29, 2021
Uipath read pdf with range Activities pdf	7	78	August 6, 2024
Read PDF by OCR time? Activities pdf , ocr , activities	5	1135	September 15, 2020
To make sharpen better of scanned pdf Studio studio , question , activities_panel	13	491	August 18, 2023
Is there a way to read the current PDF Page that is open on screen in Read PDF with OCR Help	9	1848	July 11, 2019

Most Active Users - Yesterday
Anil_G
ashokkarale
sharazkm32
Hosam_Alzahrani
dutta.marina
Steven_ds_55
SenorChang
V_Roboto_V
parnalmahavir.patni
afna
More details...

How to speed up the OCR scanning progress?

Related topics