Based on the screenshot below , I have a PDF file that is a scanned. PDF taxes may go up to 100 pages so I don’t want to digitize all at once since it will take took too long.
Is there a way we can loop at each page of a PDF file to check if a text . For example If bot found that the page that contains the text “U.S. Individual Income Tax Return” then the bot will stop and get the page number. Any idea would be a great help thank you.
Remember the document is scanned and for UiPath to read it, it has to be read by the OCR. How can you get that specific string if the document has not been read by the program?
@Jelrey - Please take a look at this …dowload this workflow and this will give you an idea about how to loop through pages…here I have looped through pages and deleteed a page where text found …one thing you have to change is instead of Read PDF Text, you have to use Read PDF using OCR…
@prasath17 , do you have some example that instead of deleting the page where the text is found , I want to retain or remain the page that contains the text and then delete other pages that does not contain the text , the opposite of what you did ? is that possible ?
@Jelrey - Try moving the pdf splitter from Else condition to Then …it will now split the pages where the text is found and keep it separated which you can combine at the end…