PDF Page missing

Hi All,

We have 500 pages to 1500 pages PDF files. we split these pdf files based on some condition. When join this pdf files we found some pages missing.

For Example: Suppose there is a 500 pages original pdf file. We split this PDF files in multiple segment like 50 different different pdf files (Eg. 1.pdf,2.pdf,3.pdf and so on) Other is remaining pdf. when we join these 50 PDF and count pages we got suppose pages 400 and remaining we got pages 85. Hence total pages are 485. These should be 500.

Note: We are not using native PDF. These PDF converted with OCR. Page may be portrait and landscape both. PDF also contains logos and textual and tabular data.

Can anybody give us the clue. what going wrong. Two activities we are using

  1. Extract PDF Range
  2. join PDF

Regards
AN

Hi @anand.t,

It is possible that some pages are being missed during the splitting process if the page range specified in the Extract PDF Range activity is incorrect. It is also possible that some pages are not being correctly recognized during the OCR process, which can cause issues when trying to split or join the PDF. I think that it is important to check if the issue is consistent, in this case I would recommend changing the activities for extract/join to some other package.

In case issue is not consistent and doesn’t always happen on the same PDF/pages - I suggest doing a post joining check to make sure that the resulting pdf has expected number of pages. The outcomes of the check can be:
a. Pass in which case the robot will just continue on.
b. Failed in which case you should put some kind of retry mechanism.

Thanks!