Find the PDF page number, if it contains Specific text

uiautomation
activities
studio

#1

How to get the PDF page number if it contains Specific text. For example, I need to store the page numbers, which contains the word “Total Ending Balance”.


#2

Hi Jayendran,

You could try to get one page at a time from the PDF file using the Range property of Get PDF Text activity. Then search in that page if it contains the word or not.


#3

Tried something similar in the past. See if this helps.

pdfPageNumber.xaml (14.8 KB)

Minor Change : The item variable in “Add To Collection” should be “PagesToRead.ToString” not “startPage.ToString”


#4

I ran pdfPageNumber.xaml, and got an error, any solution?

SEHException:
External component has thrown an exception.
at MS.Win32.UnsafeNativeMethods.ITfDocumentMgr.Push
(ITfContent context)
at
System.Windows.Documents.TextSercviceHost._RegisterTextStore(TextStore textstore)

SEHException


#5
  1. What is your studio version?
  2. Are you unable to open the file or run the xaml?
  3. If unable to run, any idea at which part it is failing?
  4. Try disabling below and set matches = 3(page count) and see if it works.

image


#6

@vvaidya thank you for your followups :slight_smile:

1.What is your studio version?

UiPath 2018 Studio 2018,1,2 Community Edition

2.Are you unable to open the file or run the xaml?

I can open&edit the file, and I changed some value of variables like “path” and “keyword” to fit my environment.
This error occurred just after I ran the file and running awhile.
I ran it by debug mode(F7), and every variables like “startPage” constantly be changing in the
while loop, so it seems to be ran properly.

3.If unable to run, any idea at which part it is failing?

I can run the file, so it does not matter this question.

4.Try disabling below and set matches = 3(page count) and see if it works.

The variable “matches”'s type is ‘System.Text.RegularExpressions.MatchCollection’, so if I
follow you, I see the error - Compiler error(s) encountered processing expressions “3”,
a type conversion error.
“Value of type ‘Integer’ cannot be converted to ‘System.Text.RegularExpressions.MatchCollection’.”


#7

I found that when I set every variable in roman characters, it runs perfectly,
but when if variables contains multibyte characters - Japanese in my case, regexp matches nothing.

Is “System.Text.RegularExpressions.Regex” support multibyte character?
If not, no matches problem comes from this non-support reason.


#8

HI @syn,

Before use the value please check in if condition
Matches.Count>0
True->use the values
False-> there is no value matches

Regards.
Arivu


#9

@arivu96 Thank you for your advice.
I will fix “Matches.Count” routine as your advice.


#10

Hi @vvaidya,

This seems to run perfectly if the pdf is not password protected. But using a pw protected pdf file returns a value of 0 in Matches.Count. Do you have any idea how to handle (get the number of pages) for pdf files that is password protected? Thanks!


#11

Hello vvaidya,

I had used above code to get total number of pages in single PDF file.
but unfortunately it will not work,. It will always give me Total page number as 0.

can you please help me out where i am making mistake.

My actual goal is to read tabular data from each PDF page.

Thanks in advance :slight_smile:

Ankur