Digitize Document activity throws an error with PDFs

Hello,

I am currently facing an issue with the Digitize Document activity in a process. When I am processing forms with the any OCR Engine, I get an unexpected error message come up (see notepad at bottom of this post).Am using a Windows 10 machine and seeing this issue with all the OCR engines.

Any suggestions ? Or knows how to prevent this issue ?

Hi,

How large is this PDF?
If it is large, can you split the PDF to individual pages and digitize each page?

thanks

Its just one page PDF

Hmmm… Do you have the latest packages?

Hi @Deepashree,

Have you tried running your process with any other PDF?

Might be the issue in this particular PDF itself.

If you can try testing with some other PDFs that way you can identify whether issue is related to PDF or the digitize activity itself.

And if it doesn’t seem to work for any other PDF also, would suggest to upgrade your packages and then try running using different ocr engines.

Regards
Sonali

Hi @Deepashree,

It would really be great if you can share at least the below details

  1. PDF size.
  2. DPI.
  3. check the extension
  4. Check the system memory too.
  5. Is updated ocr package or not?

In actual scenarios, It only happens when digitize is with force apply ocr false - so when no OCR engine is used.

Try to follow the below:

  • when you test digitize document, PLEASE use the Log Activities button in Studio to see if the activity executes, if OCR executes, etc. This would give us more info into what is failing.
  • in the case of this file, it doesn’t reach the OCR engine and crashes before that. So no matter what you put in there as an OCR engine, it is irrelevant.
    *Sometimes file might be corrupt, they might be of the wrong extensions etc.