Take a specific piece of text and the page it is on, from a PDF file

Hello everybody,

I am working on a project where the robot has to read PDF files downloaded from a web page, they contain various bookmarks and I only need 2. So, from a specific bookmark, it has to find the page number where the text of the bookmark is located that I need and then just save that piece of text.
Sorry but I’m new and I don’t know how to do it
Could you kindly orient me?

Thanks so much,
Araceli.

Hi @Araceli91
welcome to uipath communiity

well it is possible to extract the data directly using regex or string manipulation.

If you can share the details regarding the data u need to extract i can help you with the regex pattern

1 Like

Hello @NIVED_NAMBIAR

Thank you so much for your answer!
So, there are two possibilities.
The robot could find two PDF files, how?
When it downloads the PDF file and opens it (I use acrobat reader) in the bookmarks, it must look for the one called: “Soci”, if there is then it should save the page number in which it found it and all the text of the same.
Otherwise, if that bookmark did not exist, then it must look for the bookmarks: “titolari di cariche e qualifiche” and “informazioni sullo statuto” and do the same thing, that is, save the page number they are on and all the text.

I can’t add attachments because I’m newby :frowning:

Thanks again!!

u mean save the all text in that particular page @Araceli91

@Araceli91 - please take a look at this post

Yes, in the sense that I should take all the text of that specific page.

Thanks, I’ll take a look at the post. :slightly_smiling_face:

Hi @Araceli91

Then u can try this idea as well

I think there is an activity to get the pdf pages count,

Now get the pdf pages count and store in a varaible

Now loop through the numbers of pages and read the pdf page by page, after reading PDF for one page , check whether the word is coming in that string or not, if it is there u can exit out of loop and then store the data as well.

Same way u can build for other conditions as well here

Regards

Nived N :robot:

Happy Automation

I finally tried with your suggestion, but it doesn’t work for me. I’m trying another way, if I can make it all work I’ll be happy to share it with you. :grinning:

I just realized that I can share images, then this would be my pdf file.
From which I have to extract only the “Soci” bookmark for example.
I tried to read page by page with the robot and then, for each page, look for if that text contains the keyword "Soci but it didn’t work.
I also tried with the “is match” activity, yet with “myText.Contains” but nothing, I’ll try yet another way.

Hi @Araceli91
Can u show thw page where the soci word is found

Also please check whether the string data read from pdf contains Soci word by writing to text file