How to split the pdf file basis on text name

GreenTea · May 21, 2020, 1:49pm

This use case is tricky. The example provided is dependent on how well the RegEx pattern is crafted by you. Since I do not have the pdf file, you have to provide anchors and ensure the correct page boundaries are identified - not just the key words.

The idea:

Read pdf page by page with activity Read PDF Text
Search the text string with activity IsMatch
If a match (Boolean) is found, add a datarow containing the search text and starting page number
increment page number
repeat step 2
If the second page is read, update the previous datarow ending page number
When last page is read, update the datarow ending page number
Finally Extract PDF Page Range to extract the pages.

Note: activity Assign Regex Pattern is to replace a space with \s for regular expression to work correctly. You will need to change it accordingly for the text you are searching

The example contains a sample pdf which you can test to verify the workings…
PDFExtract.zip (102.5 KB)

Topic		Replies	Views
Split PDF on string matching Activities pdf , question	5	1834	August 21, 2023
Want to split the pdf file basis on text name Studio studio , question , new_feature_request	11	711	August 21, 2023
How split pdf file into many files based on specific text? Studio studio , question , activities_panel	1	791	January 5, 2023
I need to split pdf into multiple pdfs. i had no page numbers in it.Based on the text i need to split into multiple pdfs.can any one help?I had extracted pdf data and tyring to split by regex Activities pdf , activities , studio	23	2211	March 3, 2022
Split PDF Based on Bookmark Activities pdf , activities , question	2	587	August 14, 2023

How to split the pdf file basis on text name

Related topics