Digitize Document Native Scanner Does Not Behave Same As Read PDF Text Activity

david.kameka · October 24, 2020, 5:26am

When using the READ PDF TEXT Activity I get nicely formatted results.
When using Digitize Document Activity I some time get results scattered making it more difficult creating regex to capture wanted data.
I’m trying to:
Let the Digitize Document Activity complete and let it populate the DocText Variable with the output.
After this activity I want to run the READ PDF TEXT Activity and assign / replace the DocText variable with the new string from the READ PDF TEXT Activity.
The goal/value of this …is it will let me use the nicely formatted string instead of the unformatted I get sometime when letting the Digitize Document scan it.

I thought I had it until I got the following error message as soon as it got the the classification activity.

“Classify Document Scope: The document text does not match the Document Object Model.”

Hoping someone can suggest how I can force my string in the DocText to be used for the rest of the Document Understanding Framework?

system · October 26, 2020, 4:00pm

Hello @david.kameka!

It seems that you have trouble getting an answer to your question in the first 24 hours.
Let us give you a few hints and helpful links.

First, make sure you browsed through our Forum FAQ Beginner’s Guide. It will teach you what should be included in your topic.

You can check out some of our resources directly, see below:

Always search first. It is the best way to quickly find your answer. Check out the icon for that.
Clicking the options button will let you set more specific topic search filters, i.e. only the ones with a solution.
Topic that contains most common solutions with example project files can be found here.
Read our official documentation where you can find a lot of information and instructions about each of our products:
Watch the videos on our official YouTube channel for more visual tutorials.
Meet us and our users on our Community Slack and ask your question there.

Hopefully this will let you easily find the solution/information you need. Once you have it, we would be happy if you could share your findings here and mark it as a solution. This will help other users find it in the future.

Thank you for helping us build our UiPath Community!

Cheers from your friendly
Forum_Staff

Terence_Mupeti · January 8, 2021, 8:30am

Hi David, I am trying to implement exactly the same thing you were doing. Did you ever get a response or solution to the above.

g.ward · January 26, 2021, 12:07pm

Hi all,

Likewise, I am having this problem - there doesn’t seem to be an easy way to switch out the text part of the document object model!

naut1lus · April 21, 2021, 6:42am

Hi,
I have the same problem described by David!
I have tried “dom.GetVisualTextProjection.ProjectedText” but the exception is the same.

Topic		Replies	Views
Document Understanding – Digitize Document – Native PDF inaccuracies Document Understanding	6	1953	April 18, 2022
Read Native PDF file return both docText and DOM Studio studio , question , activities_panel	5	1296	November 1, 2021
Scan Pdf Document Extraction Academy Feedback	3	1212	August 25, 2020
How to read the specific data in pdf Activities pdf , activities , question	33	4917	June 2, 2021
Read PDF Text Activity should also return structured text Activities activities , considering	12	4046	January 29, 2020

Most Active Users - Yesterday
Anil_G
ashokkarale
AJ_Ask
sharazkm32
VanjaV
Anelisa_Bolosha1
Parvathy
dutta.marina
mkankatala
Nisha_K21
More details...

Digitize Document Native Scanner Does Not Behave Same As Read PDF Text Activity

Related topics