The confidence computation is a complex algorithm that keeps track of which words are found, where they are in the document, when a certain keyword or set of keywords has been added to the learning, and how many times a keyword has been reinforced. That is why it is growing
You will notice that IF the classifier makes a mistake and you correct the document type from the Validaiton Station (if you have more than one doc type in there), then new stuff appears in the learning content.
The Studio version is 2019.4.4, Community edition.
Since it always fails, the ContinueOnError property of the Create Document Validation Action activity is enable, and you will need to unable it in order to see the exception it returns.
I have downloaded UiPath Studio Pro 2020.4.0.beta1731 Community and updated all packages including prereleases, currently no errors there. However, I have a missed activity giving the following error:
I think the Studio version is the issue - 19.4 will be out of support in a couple of months… why don’t you switch to the preview channel in Studio (main menu / help / right side bar / switch to Preview), or install the latest Community?
Please let me know if this works once you set the persistence flag in project settings, and try it out on the latest version!
The project is giving some warnings because deprecated UiPath.MachineLearningExtractor package. I would suggest to update projects available on current link and update packages.
I’ve noticed that there’s a property for the Digitize Document activity that gives the possibility to force the activity to read the document with OCR (the ForceApplyOCR property). Now I’m wondering if it is possible to do something like the opposite, which would be to force it to read for example a PDF file, just like the Read PDF Text activity does, because sometimes the result of a PDF read using OCR is not good enough to bring all the information we need to extract, and most of the PDFs I’m working with, don’t really need the OCR, since they have extractable text.
So, basically my doubt is if it is possible to use the Digitize Document as if it was the Read PDF Text, in order to avoid the use of OCR when not needed.
The Digitize Document activity does not apply OCR by default. If a PDF can be natively read, it is. If a certain page contains too much coverage of images,or does not return text for native reading, or a couple other conditions, only then it applies OCR.
This full URL appears to resolve the issue, so it makes me think that what’s happening, is that for some reason, either it’s Studio, or specific to the activity, where the service or tenant information is not passed in and it can’t potentially perform the background API operations with my orchestrator, specifically for this activity…
Reason i say specific to this activity, is because my bot has always been able to connect to orchestrator fine, and other orchestrator operation has been working well.
@OsoDormilon@Ioana_Gligan not sure if there was a response to the train extractor activity for the machine learning? have been looking for the solution and was hoping uipath will be releasing something on this, but have seen anything yet. Any solution how to go about this?
I have the same problem. My PDF can be read natively perfectly, but there some non-text images like a logo, backgrounds, and due to it, the PDF is always being read as OCR and the result is very messy.
It would be better if there were an option to force extraction as text, is there a way to do it?