How to split consolidated multiple pages scanned pdf which contains different types of documents

Hi Everyone,

I have a consolidated multiple pages of scanned pdf file which contains different types of documents and they are in different format with different details i want to separation & extract them page by page with document type . how can i do that?
can anyone help me.

thanks in advance
Bharath

@Bharath_Kumar_B

can you try with Extract PDF Page Range activity?

Here you need to input the Page Range

Hope this may help you

Thanks

Hi!

We can achieve that by using Document understanding with Machine Learning.

Have a view on this

Regards,
NaNi

Hi

If you are any file name format for different document types then we can first separate them with that file name

For that

Use a assign activity like this

Arr_pdffiles = Directory.GetFiles(“yourfolderpath”,”*yourkeyword.pdf”)

Where arr_pdffiles will be array of pdf filepath

Yourkeyword means the file which has a keyword that differentiates with other file type

  1. Now you can pass this array variable to FOR EACH activity and inside the loop use READ PDF WITH OCR

  2. Before to other steps go to design tab → manage packages → all packages-> search for OmniPage and install it

  3. In read pdf with ocr use OmniPage ocr as it is reliable

Hope this would help you resolve this

Or

If it is highly complicated documents then I would suggest to use document understanding here
Have a view on this doc for more ideas

Cheers @Bharath_Kumar_B