Need to read a word in PDF file and if that word exists should remove that page and save the other pages

maddy99 · January 10, 2021, 2:50pm

Hi,

Iam opening a pdf using a link in the portal and it directly opens in chrome without downloading in local machine. Now, my question is.

I want to Check for a word “Invoice” in the pdf pages and if it exists, in any of the page.

Then should remove that page and download the pdf to local machine.

If that is not possible on browser…please help me to tell when the pdf is in local machine.

Rakesh_Sampath · January 10, 2021, 3:21pm

Hi @maddy99

Please verify the above link !! it might help you !!

prasath17 · January 10, 2021, 4:09pm

@maddy99 - You want to do is in StudioX or Studio?

maddy99 · January 10, 2021, 6:24pm

It is in Studio

maddy99 · January 10, 2021, 6:24pm

Sure thanks! will check and update you

prasath17 · January 10, 2021, 7:44pm

Hi @maddy99 …If you allowed to use Balareva activities then i can suggest a solution…Please let me know…

prasath17 · January 10, 2021, 9:02pm

@maddy99 - Here is sample workflow…

Using 'Get PDF Page Count" and get the total # of Pages Say IntPageCount
In the For each Loop, Enumerable.Range(1,IntPageCount) - This will loop through all the pages in the PDF…
Inside the For Each First, Read the PDF Text and output to StrPDFText
Next Assign Statement Match = System.Text.RegularExpressions.Regex.IsMatch(StrPDFText,“Invoice”,RegexOptions.IgnoreCase)

Match is Boolean Variable. Here i am looking for the word “Invoice”

If Match is true and It mean search word is found on that page, so do nothing or Print anything you want , In the else part using “PDF Splitter” from BalaReva Activities Split that Particular page where the match is not found

21 Pages splitted, Page 22 has the word Invoice…

Combine all the pages using “Join PDF Files” activity.

That’s it…Done…

maddy99 · January 11, 2021, 9:52am

Hi @prasath17…I have installed Balareva pdf activities…

maddy99 · January 11, 2021, 9:54am

Hi @prasath17,

Sorry for late reply…

Thankyou for the flow…

Will Check this and update you

maddy99 · January 11, 2021, 10:40am

Hi @prasath17,

I have checked the work flow, but it is showing as every pdf has Invoice word in it…

It is giving result as true for all 10 pages

But I have only Invoice word in 1st page…

Everytime My invoice word will be on top right corner.

prasath17 · January 11, 2021, 1:49pm

@maddy99…what is range of the read pdf activity? You should read page by page …for that, in the For each there is a Index component…declare a variable say IntIdx…Note: Index will always starts from 0. So in the read pdf property you have to set the range as (IntIdx+1).Tostring…same thing for PDfsplitter also

I will share the workflow.

maddy99 · January 11, 2021, 2:05pm

Hi @prasath17

It’s Working…Thanks for the help… Will mark it as Solution…

maddy99 · January 11, 2021, 4:48pm

Hi @prasath17,

I was getting all the files in the final folder…without removing the invoice page…

Should I give range in pdf splitter activity?

prasath17 · January 11, 2021, 4:53pm

@maddy99 - Please check this… Delete_PDFPage.zip (755.6 KB)

I have currently clean up…so if you run you will see 21 files gets created in the splitted folder and Final_output.pdf gets created outside/project folder.

maddy99 · January 11, 2021, 4:57pm

Hi @prasath17,

I was getting all the files in the final folder…without removing the invoice page…

I think so, when it moves to else condition, there we are reading the whole pdf file and splitting it…May be that was the issue.

It was adding all the pages without removing invoice page…

Should I give range in pdf splitter activity?

I have checked your code and the activities are missing…

Could you please help me…

prasath17 · January 11, 2021, 4:59pm

Hi @maddy99 …Did you get a chance to check my xaml? I guess you didn’t set up the range correctly that’s the issue…

Your setup Each Page will split all the page, you have to give page Range and Use (index+1).tostring…which will split page does not contain the match…

If you are still unable to resolve, then can you share your xaml?

maddy99 · January 11, 2021, 5:13pm

Hi @prasath17,

Completed Now…Just didn’t gave the range…Iam Sorry…bit confused.

Thankyou!!..for the help…

prasath17 · January 11, 2021, 5:20pm

hi @maddy99 … no problem…Glad it worked…

I purposely did not give the xaml initially because, in this way you can do a setup by looking at the screenshot. This way you will understand what’s going on.

Now, you could have got the idea, how it is working? Its simple,

Read pdf page by page → convert it to text–> do a regex for match → If match ignore that page → else/no match split that page → Finally combine all the splitted pages…

Instead of creating additional counter variable, I used the one comes with For Each(index) so that i dont have to increment it. Index will automatically increment for every read.

maddy99 · January 11, 2021, 6:03pm

Yes!! @prasath17… Learned how to read,remove and split pdf…and got total clarity after checking the shared flow…Thank you so much for your valuable time…

system · January 14, 2021, 6:03pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Separate single PDF Invoice file to multiple individual files Help pdf , activities , question	5	2942	September 18, 2023
How to delete pages from pdf Activities pdf , studio	9	2084	September 17, 2023
Deleting the first page of a pdf file Help	15	5730	October 17, 2018
Splitting a pdf document StudioX studiox , question	7	2165	May 11, 2021
Extract a page from a pdf file with several pages Studio studio , question , project_panel	8	1071	October 4, 2023

Need to read a word in PDF file and if that word exists should remove that page and save the other pages

Related topics