Active Learning Approach retraining of the model

Grant_Goebel · May 2, 2024, 6:27pm

I have a workflow built in studio, I go through the classification → extraction → validiation of data under a certain confidence threshold. My question is when I go into Action Center and validate the fields in question is that data automatically passed back into the model for retraining? If not, how do I achieve this?

Any help appreciated!

Anil_G · May 2, 2024, 8:01pm

@Grant_Goebel

It would not pass directly…that is where you have a training module to be included…when there is a validation happened in those cases the task when resumed should use the doc and validated data to train…you have a train scope activity as well then upload those trained docs to ai center and there pipelines and skill creation can be set to automatic…this way when validation happens the data is retrained and new skill gets updated

Cheers

Grant_Goebel · May 2, 2024, 10:32pm

Okay I will need to look at the AI center. Currently I built the model in the modern experience and deployed from there.

Kanika_Saluja · May 4, 2024, 1:35pm

I am finding answer to same query . In Classic approach we can re-train , but in modern i believe we are not using AI center so how can we re-train existing dataset. let me know if you are able to find the solution

Kanika_Saluja · May 4, 2024, 1:42pm

@Anil_G , appreciate your help in this query .
In attached image is flow of Active learning process where i have imported UiPath .intelligentOCR activities also .

2.But in Activities tab in DU i can’t find Train extractor and Train classifier Activities. I want to use them to re-train the existing data set after it comes back from Action center.
3.

4.Please suggest how can we proceed on this in Modern Design document undersatnding in web/desktop based.
5. Do we have in-built framework here unlike we have in classic DU.

KevinE · May 4, 2024, 2:39pm

Use the activities under App Integration > Document Understanding.

Anil_G · May 4, 2024, 5:58pm

@Kanika_Saluja

can you please recheck

cheers

Grant_Goebel · May 7, 2024, 2:53pm

When I use those activity options you have to select a project and private dataset which reside in AI Center. The modern approach in document understanding is deployed directly from document understanding and not through AI Center

Grant_Goebel · May 7, 2024, 2:54pm

My packages are identical to the picture you posted. However those rely on AI Center datasets so the modern approach document understanding project is not available in those options.

Anil_G · May 7, 2024, 4:13pm

@Grant_Goebel

Can you please elaborate why they are not?

You have upload to dataset also in the activitirs which can be leveraged to upload docs to dataset

Cheers

Grant_Goebel · May 7, 2024, 4:34pm

Here is the deployment screen from DU. As you can see I have 2 versions one undeployed, one deployed. The deployed one I am able to classify, extract, and create validation station tasks.

If I go to my AI Center

There is nothing there so then when I use the tools mentioned in previous replies there are no datasets to pick from.

Anil_G · May 7, 2024, 4:45pm

@Grant_Goebel

Just create a peoject and you can create same package here in ai center and use

Cheers

Grant_Goebel · May 7, 2024, 8:13pm

I went through and set up a workflow with the “Intelligent OCR” process which seems disconnected from the modern approach of DU. I had to recreate the taxonomy which seems like the same thing as creating the needed fields in the modern approach of DU. I get down to the “Train Extractors Scope” → “Machine Learning Extractor Trainer”. The issue I am running into there is when you deploy from the modern DU approach no ML skill exists inside of AI Center. I could set one up but I don’t understand how the two are connected and I would rather not go through the data labeling again since this is not a OOB model.

Anil_G · May 8, 2024, 3:38am

@Grant_Goebel

To retrain there must be atleast one…and to train you must select atleast one package

Cheers

Kanika_Saluja · May 14, 2024, 6:34am

@Grant_Goebel @Anil_G , same thing exist for me. when you open UiPath desktop/UiPath web from Modern DU {image attached} , even if you import document understanding.ML activities & Intelligent OCR activities you are not able to see train classifier / extractors activities over there so retrain activities are not available for Modern approach . { as attached in above images}

2. Also , in classic process when you create DU project it auto links to AI center you need not to connect them explicitly; this is not available in modern project .
3.Also ,we don’t see any options for dataset endpoints in Modern since we can’t find any project details on AI center.

Please let me know if we have any workaround for them.

Jon_Smith · May 14, 2024, 7:09am

It took me a while to see this, but I have the answer.

The ‘modern’ DU activities do not yet support a native re-training loop.
However I wrote a tutorial of how you can achieve this by leveraging some of the classic activities in combination.
Its quite tricky, but see my tutorial here.

Grant_Goebel · May 28, 2024, 9:05pm

Hey Jon,

I am working on implementing your solution here. So far so good! Thank you so much. One question I have, in my scenario I don’t necessarily have a human doing the validation at the same time. So it is possible to just create the validation task and when the human gets to fixing or verifying the data it gets passed into retraining? We are largely testing UiPath DU getting ready to switch from a different vendor later in the year.

The other issue I am running into is that I will most likely have large amount of documents this will be iterating through and the “Wait for Validation Task And Resume” is throwing an error saying it cannot be placed in side of a “For each file in folder” activity.

Edit: Here is the exact error–Cannot place activity under scope ‘For Each File in Folder’, as the activity requires persistence and the scope does not offer support for it.

Jon_Smith · May 29, 2024, 7:37am

Glad it helps!

You need the output of the Validation Task after a human has corrected it to get training data, if its not validated by a human yet it cannot be used as training data, so you’d need to put it wherever you wait for the validation task, in this loop you mention.

Speaking of which, the constraint on for each folder is actually quite logical, the job might wake up and be on an entirely different machine, so the folder might not be the same.
A normal For Each supports the persistence.

Grant_Goebel · May 29, 2024, 8:08pm

Hey Jon!

I change around the for each loop to a more c# approach and all is working now. I have now receiving an error when I the “Wait for validation Task and Resume” activity goes to execute. The error makes zero sense to me as I would assume no serialization is happening at this step. If the error were occurring on the “Add queue item” activity it would make a little more sense but even then nothing there is of type “Directory Info”.

Any insight would be appreciated!

System.Exception: Type ‘System.IO.DirectoryInfo’ cannot be serialized. Consider marking it with the DataContractAttribute attribute, and marking all of its members you want serialized with the DataMemberAttribute attribute. Alternatively, you can ensure that the type is public and has a parameterless constructor - all public members of the type will then be serialized, and no attributes will be required.

Jon_Smith · May 30, 2024, 8:49am

Yep, I can absolutely explain this.

When a job gets suspended all the variables and arguments currently in memory get serialized and sent to the Orchestrator. This is so the job can restart later, when the variables data gets sent back and is deserialized and it resumes where it was ‘bookmarked’.

This comes with limitations though, for example to activity that is suspending (or persisting) must be on the ‘Entry Point’ workflow (this is typically the Main.xaml), the other limitation is that all the variables are serializable. Its only the variables you can ‘see’ when the job suspends, so basically when you click on the Wait task, see what variables appear in the variables pane.

It sounds like you have a DirectoryInfo type variable available at that point and thats what it is upset about as it cannot serialize it.
I feel like if I recall correctly that DirectoryInfo should be serializable, but Microsoft messed up and didnt set the flag for it and never fixed it. Regardless, you need to get that moved, if possible change the scope of it so its only used in the part of the project its needed in. If you need it after the wait task you need to do some jiggery pokery and instead keep the DirectoryInfo variable in a tight scope and only store the file path as a string on a higher scope and then make a new DirectoryInfo later in the process when you need it again.

Topic		Replies	Views
How To Implement A Document Understanding Training Loop On Cross Platform Activities or Studio Web Vote on Tutorials studio , document_understanding , ai_center , ai_fabric , ai-fabric , document-understanding , ai-center , studioweb	6	928	June 3, 2024
Retrain the ML Skill from Action Centre AI Center activities , studio , date	6	3263	September 13, 2021
Does the machine learning api still work Document Understanding activities , document_understanding	8	1610	June 9, 2020
How to migrate Modern(Active Learning)Document Understanding Skill to other Orchestrator AI Center question , ai_center	4	685	June 1, 2024
Document Understanding: New Human-Robot Levels Available :) Product News news , document_understanding	51	6700	March 1, 2022

Active Learning Approach retraining of the model

Related topics