How to speed up the OCR scanning progress?


#1

dear, I am using Read PDF with OCR function, however, even I reduced the scale from “1” to “0.75”, the processing time is still too long, say 3 min for 4 PDF pgs.

Is there any ways to speed up the scanning process? say, only Read PDF for first 2 pgs? or select a specific area from PDF, say bottom part of PDF, in order to reduce the time for scanning the whole PDF?

thanks~


#2

Input
Range

The range of pages that you want to read. If the range isn’t specified, the whole file is read. You can specify a single page (e.g. “7”) or a range of pages (e.g. “2-9”) to be read. Only string variables and strings are supported. The default value is “All”.
Note: Strings must be placed between quotation marks.


#3

thanks @vvaidya~

if I would like to specify page 1 and 4, how should I express it? “1” + “4”?

further, if I would like to read OCR in PDF, say:
Date of Issue: “22 Nov 2016”, shall I use “set focus” to highlight the “Date of Issue” and how can I read “22 Nov 2016” in UIPath?

Further, may I know how to identify a PDF, say when PDF pages more than 5 pages.? coz usually my PDF only contain less than 4 pages, so how can I express in UIPAth that if more than 5 pages, this is special PDF?

much appreciated and thanks~


#4

@MichaelC Probably you can perform this using loop.


#5

I’m not sure if PDF activies has this functionality, but below code should work
pdfPages.xaml (6.3 KB)


How to get pdf file page count
Substring after line containing specific text
#6

To regad date of issue, once you have OCR’ed PDF. below regex will return the date.

Issue:(.{11}) —> User matches activity


#7

thank you~


#8

dear, what is the expression if I want to extract 20 Oct 2016 from
"Dateofissue: 2 0 OCT 20 1 6", where only Dateofissue will be fixed for all PDF,
I hv tried this expression, but cannot extract 20 Oct 2016
"Advice.Substring(advice.IndexOf(“Dateofissue:”),advice.IndexOf(“Dateofissue:”)-Advice.IndexOf(“Dateofissue”)+12)"

much appreciated and thanks~


#9

This is more easier i feel*

strDate = Split(Advice,":")(1)

strDate = Regex.Replace(strDate , "(?<=\d)\p{Zs}(?=\d)", "") —> Removes space between numbers