How to speed up the OCR scanning progress?


dear, I am using Read PDF with OCR function, however, even I reduced the scale from “1” to “0.75”, the processing time is still too long, say 3 min for 4 PDF pgs.

Is there any ways to speed up the scanning process? say, only Read PDF for first 2 pgs? or select a specific area from PDF, say bottom part of PDF, in order to reduce the time for scanning the whole PDF?




The range of pages that you want to read. If the range isn’t specified, the whole file is read. You can specify a single page (e.g. “7”) or a range of pages (e.g. “2-9”) to be read. Only string variables and strings are supported. The default value is “All”.
Note: Strings must be placed between quotation marks.


thanks @vvaidya~

if I would like to specify page 1 and 4, how should I express it? “1” + “4”?

further, if I would like to read OCR in PDF, say:
Date of Issue: “22 Nov 2016”, shall I use “set focus” to highlight the “Date of Issue” and how can I read “22 Nov 2016” in UIPath?

Further, may I know how to identify a PDF, say when PDF pages more than 5 pages.? coz usually my PDF only contain less than 4 pages, so how can I express in UIPAth that if more than 5 pages, this is special PDF?

much appreciated and thanks~


@MichaelC Probably you can perform this using loop.


I’m not sure if PDF activies has this functionality, but below code should work
pdfPages.xaml (6.3 KB)

How to get pdf file page count
Substring after line containing specific text

To regad date of issue, once you have OCR’ed PDF. below regex will return the date.

Issue:(.{11}) —> User matches activity


thank you~


dear, what is the expression if I want to extract 20 Oct 2016 from
"Dateofissue: 2 0 OCT 20 1 6", where only Dateofissue will be fixed for all PDF,
I hv tried this expression, but cannot extract 20 Oct 2016

much appreciated and thanks~


This is more easier i feel*

strDate = Split(Advice,":")(1)

strDate = Regex.Replace(strDate , "(?<=\d)\p{Zs}(?=\d)", "") —> Removes space between numbers