To split one pdf into multiple i am getting data from pdf where on first page it tell
Page 1-9.
SO i need to split 9 pages in one pdf.
on 10 page it will be like 1-4
so page 10,11,12,13 need to be in one pdf
If the PDF is a Scanned Pdf i.e it is containing images then it would be Difficult to extract data if the quality of the image is very low.
However, Give a Try using the UiPath OCR and Let us know. You may need to activate Enterprise Trial in Orchestrator and it needs the Document Understanding API Key.
It was possible to Match the Page Ranges in multiple pages of the Images. But it is also need to be verified whether Regex Expression used will be able to detect it in other Samples as well.
Below is the Regex :
(\d+)\s*of\s*(\d+)
You can use the Matches Activity to get all the Matches. You can then use it’s output to Check the Total Count or the Values that were matched using the Below Expression :
The PageRangeMatches variable is the Output of Matches Activity.
Also we can note that, the Page Range doesn’t continue according to the Number of Pdf Pages but rather each Split Document Page Starts from Page 1. We also need to confirm whether this is the case for all the documents that you receive.
Yes ,all documents are like that , and with your regex i got the value.
i am using extract pdf with page range . first pdf is extracting correct , and from 2nd its extracting wrong .
Could you provide the Extracted data from PDF in Text files, so that I could confirm from my Side that the Solution Developed works good for all similar cases.
Below is the Workflow Developed so far. It gives out an Error now, Since the Page Ranges do not get Extracted properly. Split_PdfPages_ByPageNo.zip (1.4 MB)