I have a consolidated multiple pages of scanned pdf file which contains different types of documents and they are in different format with different details i want to separation & extract them page by page with document type . how can i do that?
can anyone help me.
thanks in advance
can you try with Extract PDF Page Range activity?
Here you need to input the Page Range
Hope this may help you
We can achieve that by using Document understanding with Machine Learning.
Have a view on this
If you are any file name format for different document types then we can first separate them with that file name
Use a assign activity like this
Arr_pdffiles = Directory.GetFiles(“yourfolderpath”,”*yourkeyword.pdf”)
Where arr_pdffiles will be array of pdf filepath
Yourkeyword means the file which has a keyword that differentiates with other file type
Now you can pass this array variable to FOR EACH activity and inside the loop use READ PDF WITH OCR
Before to other steps go to design tab → manage packages → all packages-> search for OmniPage and install it
In read pdf with ocr use OmniPage ocr as it is reliable
Hope this would help you resolve this
If it is highly complicated documents then I would suggest to use document understanding here
Have a view on this doc for more ideas
The UiPath Document Understanding framework facilitates the processing of incoming files, from file digitization to extracted data validation, all in an open, extensible, and versatile environment. Document Understanding is designed to help you...