How to split PDF depending on a word

studio
variable

#1

Hi,
In Sweden we get preprinted tax returns from the Tax Agency. It can be 4-6 pages long. I will be getting these for several individuals in the same PDF. I need to split this large PDF so that I get one preprinted tax return per individual. I know a word on the first page that only exists on this first page. So what I want the robot to do is to find this word on page one and then find the next instance of this word (which is the start for the next person’s preprinted tax return). The splited PDF should have the first page and then all the other pages, until the next word comes. And so on. Any ideas how to split it like this?


#2

Can you attach sample PDF’s??


#3

Does the format need to stay the same? Meaning no alteration to the document except splitting pages. Or, are you ok with the string format Read PDF Text gives you?


#4

Hi, the format needs to stay the same. The new PDFs are to be sent to different people within my organisation.


#5

I can’t think of any simple, native ways of doing this through UiPath. There may be a better way, but here is one idea:

  • Read PDF Text activity one page at a time and search for this key word. Add the page numbers it was found into an array. Go through the array and come up with a set of ranges. If pages are 1, 5, 7 with the key word out of 9 pages for example, you would “print” ranges 1-4, 5-6, 7-9
  • Have the bot open up each PDF in Adobe and Print>Save As PDF. Print (save) for each range.

With this method you could do it without additional software or having to learn something new. Apparently there’s a way to do this through some advanced Adobe method but I’m not familiar with the process. Here is the link if you’re interested: https://forums.adobe.com/thread/1473698


#6

This is a very good idea. I will see if I have the skill to do this.


#7

How do I use the Read PDF Text on one page at the time?


#8

Set a variable to 1 and start a loop. Read PDF Text with Range set to this variable. Then increment this variable by one after it reads the page. Of course you’ll need to find a way to exit the loop when you reach the end of the document. I would experiment with what happens when you try to read a page of a document that doesn’t exist. For example, if it throws an exception then just catch the exception and break out of the loop.