Find the PDF page number, if it contains Specific text

How to get the PDF page number if it contains Specific text. For example, I need to store the page numbers, which contains the word “Total Ending Balance”.

Hi Jayendran,

You could try to get one page at a time from the PDF file using the Range property of Get PDF Text activity. Then search in that page if it contains the word or not.

Tried something similar in the past. See if this helps.

pdfPageNumber.xaml (14.8 KB)

Minor Change : The item variable in “Add To Collection” should be “PagesToRead.ToString” not “startPage.ToString”

1 Like

I ran pdfPageNumber.xaml, and got an error, any solution?

SEHException:
External component has thrown an exception.
at MS.Win32.UnsafeNativeMethods.ITfDocumentMgr.Push
(ITfContent context)
at
System.Windows.Documents.TextSercviceHost._RegisterTextStore(TextStore textstore)

SEHException

  1. What is your studio version?
  2. Are you unable to open the file or run the xaml?
  3. If unable to run, any idea at which part it is failing?
  4. Try disabling below and set matches = 3(page count) and see if it works.

image

@vvaidya thank you for your followups :slight_smile:

1.What is your studio version?

UiPath 2018 Studio 2018,1,2 Community Edition

2.Are you unable to open the file or run the xaml?

I can open&edit the file, and I changed some value of variables like “path” and “keyword” to fit my environment.
This error occurred just after I ran the file and running awhile.
I ran it by debug mode(F7), and every variables like “startPage” constantly be changing in the
while loop, so it seems to be ran properly.

3.If unable to run, any idea at which part it is failing?

I can run the file, so it does not matter this question.

4.Try disabling below and set matches = 3(page count) and see if it works.

The variable “matches”'s type is ‘System.Text.RegularExpressions.MatchCollection’, so if I
follow you, I see the error - Compiler error(s) encountered processing expressions “3”,
a type conversion error.
“Value of type ‘Integer’ cannot be converted to ‘System.Text.RegularExpressions.MatchCollection’.”

I found that when I set every variable in roman characters, it runs perfectly,
but when if variables contains multibyte characters - Japanese in my case, regexp matches nothing.

Is “System.Text.RegularExpressions.Regex” support multibyte character?
If not, no matches problem comes from this non-support reason.

HI @syn,

Before use the value please check in if condition
Matches.Count>0
True->use the values
False-> there is no value matches

Regards.
Arivu

1 Like

@arivu96 Thank you for your advice.
I will fix “Matches.Count” routine as your advice.

Hi @vvaidya,

This seems to run perfectly if the pdf is not password protected. But using a pw protected pdf file returns a value of 0 in Matches.Count. Do you have any idea how to handle (get the number of pages) for pdf files that is password protected? Thanks!

Hello vvaidya,

I had used above code to get total number of pages in single PDF file.
but unfortunately it will not work,. It will always give me Total page number as 0.

can you please help me out where i am making mistake.

My actual goal is to read tabular data from each PDF page.

Thanks in advance :slight_smile:

Ankur

How to find total number of pages in particular site in uipath.

That depends from site to site, can you give me an example of site where you would like to get the number of pages?

Fir example in amzaon home page there will be many links. As per requirement only 20 links should present in a page .How can i check no of links in a page.?

and in amazon in category of apple iphone how many total no of apple iphone link present including all pages.

You can use the Data Scraping wizard to extract all the links with the same structure. The output will be a variable of type DataTable and you use something like ExtractedDataTable.Rows.Count to get the number of returned results. Also, it is possible to limit the Maximum number of results in the wizard (default is 100) or later in the Extract data activity using the MaxNumberOfResults property.

But, for your example on Amazon, you can use Get Text activity to the entire line where you have the number of results, and then use string methods to extract what is between of and results.