Extract PDF Page Range by Regex in a Folder

Hi,

I would like to extract the Appendix of a specific page from a few 500+pages PDFs in a Folder and save it as new PDFs.

In every PDF, the Appendix is on different pages. But there is specific wording to locate the pages using RegEx.

I tried to assign a variable for the Regex, and put the “extractappendix” variable to range. However, there are errors. Can anyone help?

Hi @cclemon

In Range you need to mention range like below

Regards,

@cclemon

In this case, first you need to Read that pdf using Read PDF Text activity then by using your Specific word Regex get that page number then use Extract PDF Page Range activity

Regards,

1 Like

Hey @cclemon try to pass the specific range ..

cheers Happy Automation

Hi @cclemon,

in this scenario read each page of PDF file and after reading each page check your condition with your key value, if condition passed then you can use output of that page data other wise continue until last page

Thanks. Can I know how to get the page number in this case?

What activity should I assign?

Should I use if else activity? Will it be possible for u to show me the activities? Thanks.

Can you share a sample PDF

@cclemon

Please check this on how to do the loop and read each page and check

cheers

1 Like

Hi @Anil_G

For the condition field, does it work to enter ExtractedPDFText Contains Regex?
Also, may I know where does the “currentNumber.ToString” in the Range field comes from?

I am a beginner, sorry for asking dumb questions :sweat_smile:

1 Like

@Komom

It is a for loop internal varible might be littled different for you…to know start typing current and you would see it might be currentitem as well

Now for condition you can use system.Text.RegularExpressions.Regex.IsMatch(extractedpdfdata,"YourRegex") - this will return true if regex matches else false

Cheers

1 Like

Thank you so much for your reply! It really helps a lot!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.