Find the PDF page number, if it contains Specific text

Jayendran · February 27, 2018, 1:11pm

How to get the PDF page number if it contains Specific text. For example, I need to store the page numbers, which contains the word “Total Ending Balance”.

Silviu · February 27, 2018, 2:02pm

Hi Jayendran,

You could try to get one page at a time from the PDF file using the Range property of Get PDF Text activity. Then search in that page if it contains the word or not.

vvaidya · February 27, 2018, 3:00pm

Tried something similar in the past. See if this helps.

pdfPageNumber.xaml (14.8 KB)

Minor Change : The item variable in “Add To Collection” should be “PagesToRead.ToString” not “startPage.ToString”

syn · March 9, 2018, 2:29am

I ran pdfPageNumber.xaml, and got an error, any solution?

SEHException:
External component has thrown an exception.
at MS.Win32.UnsafeNativeMethods.ITfDocumentMgr.Push
(ITfContent context)
at
System.Windows.Documents.TextSercviceHost._RegisterTextStore(TextStore textstore)

SEHException

vvaidya · March 9, 2018, 5:21pm

What is your studio version?
Are you unable to open the file or run the xaml?
If unable to run, any idea at which part it is failing?
Try disabling below and set matches = 3(page count) and see if it works.

syn · March 12, 2018, 2:10am

@vvaidya thank you for your followups

1.What is your studio version?

UiPath 2018 Studio 2018,1,2 Community Edition

2.Are you unable to open the file or run the xaml?

I can open&edit the file, and I changed some value of variables like “path” and “keyword” to fit my environment.
This error occurred just after I ran the file and running awhile.
I ran it by debug mode(F7), and every variables like “startPage” constantly be changing in the
while loop, so it seems to be ran properly.

3.If unable to run, any idea at which part it is failing?

I can run the file, so it does not matter this question.

4.Try disabling below and set matches = 3(page count) and see if it works.

The variable “matches”'s type is ‘System.Text.RegularExpressions.MatchCollection’, so if I
follow you, I see the error - Compiler error(s) encountered processing expressions “3”,
a type conversion error.
“Value of type ‘Integer’ cannot be converted to ‘System.Text.RegularExpressions.MatchCollection’.”

syn · March 12, 2018, 2:21am

I found that when I set every variable in roman characters, it runs perfectly,
but when if variables contains multibyte characters - Japanese in my case, regexp matches nothing.

Is “System.Text.RegularExpressions.Regex” support multibyte character?
If not, no matches problem comes from this non-support reason.

arivu96 · March 12, 2018, 2:52am

HI @syn,

Before use the value please check in if condition
Matches.Count>0
True->use the values
False-> there is no value matches

Regards.
Arivu

syn · March 12, 2018, 4:32am

@arivu96 Thank you for your advice.
I will fix “Matches.Count” routine as your advice.

JPOkawa · April 12, 2018, 5:49am

Hi @vvaidya,

This seems to run perfectly if the pdf is not password protected. But using a pw protected pdf file returns a value of 0 in Matches.Count. Do you have any idea how to handle (get the number of pages) for pdf files that is password protected? Thanks!

ankur · June 18, 2018, 2:14pm

Hello vvaidya,

I had used above code to get total number of pages in single PDF file.
but unfortunately it will not work,. It will always give me Total page number as 0.

can you please help me out where i am making mistake.

My actual goal is to read tabular data from each PDF page.

Thanks in advance

Ankur

Bharti_sanghmitra · August 30, 2018, 1:51pm

How to find total number of pages in particular site in uipath.

Silviu · August 31, 2018, 5:29am

That depends from site to site, can you give me an example of site where you would like to get the number of pages?

Bharti_sanghmitra · August 31, 2018, 7:43am

Fir example in amzaon home page there will be many links. As per requirement only 20 links should present in a page .How can i check no of links in a page.?

and in amazon in category of apple iphone how many total no of apple iphone link present including all pages.

Silviu · August 31, 2018, 10:23am

You can use the Data Scraping wizard to extract all the links with the same structure. The output will be a variable of type DataTable and you use something like ExtractedDataTable.Rows.Count to get the number of returned results. Also, it is possible to limit the Maximum number of results in the wizard (default is 100) or later in the Extract data activity using the MaxNumberOfResults property.

But, for your example on Amazon, you can use Get Text activity to the entire line where you have the number of results, and then use string methods to extract what is between of and results.

Topic		Replies	Views
Find pdf file name and page number, if it contains specific text RPA Discussions uiautomation , activities , studio , general	5	1947	October 18, 2022
Page number details based on keyword Search Studio studio , question	8	502	October 10, 2022
How to read only particular pages from PDF Help studio	7	1842	August 13, 2018
Read PDF and find page number where text occurs Help pdf , activities , question	8	4585	December 1, 2019
Extract PDF oages contain specific text Activities pdf , activities , question	5	1296	October 26, 2022

Most Active Users - Yesterday
Anil_G
ashokkarale
jinal.shah
Gautham_Pattabiraman
postwick
chandreshsinh.jadeja
vrdabberu
Ajay_Mishra
sven.wullum1
Vyshnavi_Nalumachu
More details...

Find the PDF page number, if it contains Specific text

Related Topics