How to extract pdf by dynamic range with conditions?

Good afternoon friends, you want to extract PDFs with the condition that it has the same provider, for example the provider “GIGALUX SRL”, it has 3 documents of one page each. There are cases in which the provider has documents with two pages, for example “PROVEFRIO SRL”, on the second page of this provider its name no longer appears, but obviously it must be included. I have already read the names of the suppliers’ documents but I cannot make the cut properly.

1 pdf with 3 “GIGALEX SRL” Documents of one page each (3 pages in total)
1 pdf with 4 Documents DE “KUEHNE + NAGEL SA” of one page each (4 pages in total)
1 pdf with 3 “SEAN METAL SRL” Documents of one page each (3 pages in total)
1 pdf with 3 “PROVEFRIO SRL” Documents of two pages each (6 pages in total)
1 pdf with 4 “INMATEL SA” Documents of one page each (4 pages in total)

I attach the bot to you, it would be great if someone help me with the bot logic

Thanks

Lnx

@gh_lzm - I think, this is achievable. I will try now.

But one question: Are you allowed to use BalaReva activities?? i.e. Marketplace packages…

1 Like

Thank you very much for answering, there are no problems with using other packages. I would greatly appreciate your help.

@gh_lzm - Here you go…

image

Worflow: Split_PDF.zip (656.5 KB)

If you run this as is, you will see 3 pdfs gets created in the Splitted Folder and then “Final.pdf” will be created under the root project folder with the keyword “GIGALUX SRL” on it.

This is a start, i am thinking of making it dynamic. or you can put this multi if cases to achieve it.

Thank you very much for the answer, but I have a problem with: Could not load file or assembly ‘itextsharp, Version = 5.5.13.0, Culture = neutral, PublicKeyToken = 8354ae6d2174ddca’ or one of its dependencies.

Balareva seems to have yet another dependency.

@gh_lzm - What is your UiPath version??

I tried, with 2019 and one with 2021

I have sent you a PM yesterday. Please click your profile photo and click message icon to check and respond.

image

Thank you very much for your recommendations, I was out yesterday and nothing can advance. The point is that I will only be able to use the modules that I have assigned.

@gh_lzm - Sorry , that’s all I can do. Code which I shared is working fine here.

I asked you to come in zoom call, which you still hasn’t responded.

Yesterday this was very complicated for me, but I have to work with OCR, not with regex. Anyway, I appreciate your interest.

I am trying to help you, if you are not interested in the meeting, i am not sure how to proceed.

Please try your way. Good luck.

I have Used OCR & Extract PDF Range activity and Removed BalaReva Package.

Here is the updated workflow…Split_PDF.zip (1.0 MB)

Note: only downside in Using the Extract PDF Range is , file size will be more than the original pdf.

Your bot is very good, but the problem is the following, as the bot could read the names of the other providers, since they change from month to month and I have noticed that you explicitly search for the first provider “GIGALEX SRL” through regex, for the first provider it would work, but for the others it would have a problem and I could not do a regex for each one, because month by month the names of the providers change.

Excuse me for my English, I am using the google translator.

Here you gave 5 providers. How many such providers you have?

Yes, that is a pdf that I put with few providers, usually there are around 80 providers, over time others leave and others enter.

@gh_lzm - Then , first we have build an array with all the providers and loop thru it. In this way we can use the providers name from the array in the regex and make it dynamic.

I think we can do this way. Try this way if not I will try this during this weekend and let you know.

Great, I find it very interesting, I will try. But something strange I found in the “then” and “else” of “IF” in the next screenshot, in the “else” curiously it does not read my variable “OldIntCurrentPage”, nor does it allow values to be assigned to that variable, but it does I do that on the “then” side there if it works, it could be that that’s why it doesn’t extract the pdfs in the right place.

Ready, get to solve the problem, now if it works perfect

OCR_SplitTest2.zip (861.5 KB)

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.