Document Understanding: New Human-Robot Levels Available :)

Hi Have you tried this new feature? I have a small doubt , we are clicking SAVE manually while classifying and extracting data, is there a way we can automate this completely?

Hi @Ioana_Gligan, would you be able to hint us when this will be available on cloud AiFabric? I’ve been seeing “a couple of weeks” as a “deadline” for more than a month now :slight_smile: Appreciate your feedback!

Hope my Enterprise Trial does not run out untill then :slight_smile:

Hi Loana,

Request you to please let us know by when would AI fabric and Data Manager be available to the Community Edition.

Even if it is with some restrictions, just can’t.wait to try out the new Functionality

Thanks and Regards,
Rajiv

@Gemlan this is another solution as well.

1 Like

Thanks @rmunro

Hello @bbimuyi,

the DU framework is currently capable of handling PDFs and images. If you can turn your excel file into a PDF, it will run through the framework.
We are looking at extending the range of file types handled by the DU framework - but it will take some time before Office files are accessible…

@Saranyajk - if you don’t want human validation, just remove it from the workflow :slight_smile: If you have use cases in which you do NOT need to validate classificaiton or data extraction (only recommended if you can perform some very good checks on the automatic extraction results in the RPA workflow itself and you decide results are good enough), then just don’t put the human steps in :slight_smile: You don’t need to automate the human step, just take it out completely if necessary :slight_smile:

2 Likes

Hello @raf667,

It is available if you are in the Insider Preview Program. if you are not, please join and ask for access to the Labeling experience in AIF .

Hello @Rajiv_Mulchandani,

Community users cannot yet have access to Data Manager in AI Fabric. To test it out, please ask for an Enterprise Trial and join our Insider Preview Program!

Ioana

Hello, I am trying to create a custom OCR engine, but can’t find any documentation about how to do it.
The only thing I found is this - OCR contracts But there is no information on how to build and release the engine (For example how Abbyy did)

Where can I find any information about that or any examples?

And also I can’t include UiPath.OCR.Contracts into my vs project, because there is no Nuget package with that name…

You should be able to find the package on myget if you switch the source… and please have a look at GitHub - UiPath/Document-Processing-Code-Samples: Code samples for document processing activities. for a very very basic sample OCR engine (that doesn’t do ocr, of course, it just “dummy” creates an output). What is missing from the sample project, is how to create design-time wizards. We will try to add something to that effect as well.

@Ioana_Gligan How can I train ML endpoints as per human validation so that it will learn and produce better result in future?

1 Like

it won’t work in community version.
Correct me if I’m wrong

1 Like

you will need to use the machine learning extractor trainer in a train extractors scope to collect the training data. They you will need to import it into a data labelling session in AI Center (on a trial account in our Cloud platform for example), and then retrain your model based on that data.

it will work for collecting the data - but unless you do have an enterprise account or a trial account, you won’t have access to AI Center to train your model…

@Ioana_Gligan Acknowledged. Thank you

1 Like

@Ioana_Gligan I wanted to know one thing.
I’m currently making a custom ml skill using the data manager feature you guys have provided. My question is as follows :
How many documents do I have to label in my Data Manager if I have a semi-structured document from which I have to extract 40 fields. On how many documents do I have to train by data labeling ?

Your response will be really appreciated.

Regards,
Raheel Ahmed

You know the answer :slight_smile:

It depends. If your documents have very little variation (closer to forms (imagine some standard bank form) than to semi-structured documents (imagine invoices)), then you need less samples.

I think the best approach is to label “some” - say 50 to 100, then train the model, see how it performs, and keep going…

Make sure you have a diversified collection of documents you are labeling: so for example, if you have two prevalent formats, don’t only train on one of them… make sure to add a balanced and representative set of variations that you need to process in your real use case.

Also make sure all of your 40 fields do appear sufficient times in your document set. So that you don’t only find a certain field in 1 or 2 documents from the set. Otherwise there will be too little information for the ML model to work with…

Hope this helps :slight_smile:

1 Like

@Ioana_Gligan Thank you for response. Well, the scenario is as follows:

I did trained my model for 40 fields & on almost 40 documents and yes the document is like a bank form whose format have little variations.
I completed the training and run the pipeline which went successful

but in return I only got 15 fields in my ML extractor, out of the 40 fields that I labeled in my data manager.
Why is that so ?
Can you please figure that out or give me some suggestion regarding this scenario ?

Regards,
Raheel Ahmed

on the 40 docs you used for training - do you have all the 40 values?
When you say you only got 15 fields in the ML Extractor - is this as far as capabilities go (only 15 fields you see in the list of fields when you do configure extractors), or as far as values go (only 15 of 40 fields have values returned by the model)?

1 Like