Limit Page OCR framework

Hi all,

-Being reading a few post as usual about the the current scenario before sending the post but none specifically about ‘Not reading/continuing’ after a specific page number.

-Our previous question has been answered already: so no page limits on the OCR framework, great!
-so why does it stops?

-----Added today----
•Context: we’re using Document Understanding framework to read a 50 pages or so PDF.
•Issue:
-The OCR engine (digitize section) does not go any further than page 6!
-The page where it stop is almost ‘Blank’ and the resto of PDF continues with pages with regular
info again on it (in case it helps)
-It just stops reading the rest, How do we know? (see ‘Steps taken’)
•Steps taken:
-In Debug mode using a ‘Break point’ (a few steps after PDF has been Digitized) we checked
the String of Text made as a result from ‘Digitize’ section and does not have all the info
expected from the DOC.
-We tried with a smaller PDF (2 pages) and we get all the info on that String as expected.
•Engine currently used: I will say ‘all of them’. We just tried all engines yesterday.

Comments:
-Weird isn’t it?
-So we already open a ticket with the guys from Enterprise support (waiting response), we’ve
been checking post on the forum as usual but is never bad to get as much feedback/ideas as
we can.
-Unfortunately, we can’t share the doc because confidentiality

PS, I just edited this post now with new info from Yesterday testing all engines on the Studio.

Feel free to send any thoughts, Stay Safe!

Hello @MARIODC!

It seems that you have trouble getting an answer to your question in the first 24 hours.
Let us give you a few hints and helpful links.

First, make sure you browsed through our Forum FAQ Beginner’s Guide. It will teach you what should be included in your topic.

You can check out some of our resources directly, see below:

  1. Always search first. It is the best way to quickly find your answer. Check out the image icon for that.
    Clicking the options button will let you set more specific topic search filters, i.e. only the ones with a solution.

  2. Topic that contains most common solutions with example project files can be found here.

  3. Read our official documentation where you can find a lot of information and instructions about each of our products:

  4. Watch the videos on our official YouTube channel for more visual tutorials.

  5. Meet us and our users on our Community Slack and ask your question there.

Hopefully this will let you easily find the solution/information you need. Once you have it, we would be happy if you could share your findings here and mark it as a solution. This will help other users find it in the future.

Thank you for helping us build our UiPath Community!

Cheers from your friendly
Forum_Staff

Hi @MARIODC, Thanks for raising the concern. What is the error message?

Hi @amitbhatt55,

Technically there is no ‘Error’ message since the OCR engine reads a few pages and then it keeps going with the rest of the process. The issue is that is basically not reading ‘All’ or at least most of them.

So imagine if you had a 20 pages PDF and then you only get data from the first 2 pages (using any OCR engine available) and finish the sequence. There was no error message and your code/sequence ran start-to-end but you are not getting all data you need, that is the situation now!

Hello @MARIODC ,

Can you please try a couple of things:

  1. activate the “Log Activities” option in the debug tab, check the ForceApplyOCR flag in the Digitize Document activity, and debug your workflow for that particular file - see how many times the OCR engine is being called. You should see “ executing” and “ closed” messages appear in your output tab if you have the Log Activities activiated.
  2. without changing any of the digitize document or ocr engine properties, try extracting say page 5 through 7 from the PDF and run only those through the workflow - does the text get reported properly in the text output of the digitize document? Or does it only contain the text from the pages that were previously digitized anyway?

Thanks,

Ioana

1 Like