I have a workflow built in studio, I go through the classification → extraction → validiation of data under a certain confidence threshold. My question is when I go into Action Center and validate the fields in question is that data automatically passed back into the model for retraining? If not, how do I achieve this?
It would not pass directly…that is where you have a training module to be included…when there is a validation happened in those cases the task when resumed should use the doc and validated data to train…you have a train scope activity as well then upload those trained docs to ai center and there pipelines and skill creation can be set to automatic…this way when validation happens the data is retrained and new skill gets updated
I am finding answer to same query . In Classic approach we can re-train , but in modern i believe we are not using AI center so how can we re-train existing dataset. let me know if you are able to find the solution
@Anil_G , appreciate your help in this query .
In attached image is flow of Active learning process where i have imported UiPath .intelligentOCR activities also .
2.But in Activities tab in DU i can’t find Train extractor and Train classifier Activities. I want to use them to re-train the existing data set after it comes back from Action center.
3.
4.Please suggest how can we proceed on this in Modern Design document undersatnding in web/desktop based.
5. Do we have in-built framework here unlike we have in classic DU.
When I use those activity options you have to select a project and private dataset which reside in AI Center. The modern approach in document understanding is deployed directly from document understanding and not through AI Center
My packages are identical to the picture you posted. However those rely on AI Center datasets so the modern approach document understanding project is not available in those options.
Here is the deployment screen from DU. As you can see I have 2 versions one undeployed, one deployed. The deployed one I am able to classify, extract, and create validation station tasks.
I went through and set up a workflow with the “Intelligent OCR” process which seems disconnected from the modern approach of DU. I had to recreate the taxonomy which seems like the same thing as creating the needed fields in the modern approach of DU. I get down to the “Train Extractors Scope” → “Machine Learning Extractor Trainer”. The issue I am running into there is when you deploy from the modern DU approach no ML skill exists inside of AI Center. I could set one up but I don’t understand how the two are connected and I would rather not go through the data labeling again since this is not a OOB model.
@Grant_Goebel@Anil_G , same thing exist for me. when you open UiPath desktop/UiPath web from Modern DU {image attached} , even if you import document understanding.ML activities & Intelligent OCR activities you are not able to see train classifier / extractors activities over there so retrain activities are not available for Modern approach . { as attached in above images}
2. Also , in classic process when you create DU project it auto links to AI center you need not to connect them explicitly; this is not available in modern project .
3.Also ,we don’t see any options for dataset endpoints in Modern since we can’t find any project details on AI center.
Please let me know if we have any workaround for them.
It took me a while to see this, but I have the answer.
The ‘modern’ DU activities do not yet support a native re-training loop.
However I wrote a tutorial of how you can achieve this by leveraging some of the classic activities in combination.
Its quite tricky, but see my tutorial here.
I am working on implementing your solution here. So far so good! Thank you so much. One question I have, in my scenario I don’t necessarily have a human doing the validation at the same time. So it is possible to just create the validation task and when the human gets to fixing or verifying the data it gets passed into retraining? We are largely testing UiPath DU getting ready to switch from a different vendor later in the year.
The other issue I am running into is that I will most likely have large amount of documents this will be iterating through and the “Wait for Validation Task And Resume” is throwing an error saying it cannot be placed in side of a “For each file in folder” activity.
Edit: Here is the exact error–Cannot place activity under scope ‘For Each File in Folder’, as the activity requires persistence and the scope does not offer support for it.
You need the output of the Validation Task after a human has corrected it to get training data, if its not validated by a human yet it cannot be used as training data, so you’d need to put it wherever you wait for the validation task, in this loop you mention.
Speaking of which, the constraint on for each folder is actually quite logical, the job might wake up and be on an entirely different machine, so the folder might not be the same.
A normal For Each supports the persistence.
I change around the for each loop to a more c# approach and all is working now. I have now receiving an error when I the “Wait for validation Task and Resume” activity goes to execute. The error makes zero sense to me as I would assume no serialization is happening at this step. If the error were occurring on the “Add queue item” activity it would make a little more sense but even then nothing there is of type “Directory Info”.
Any insight would be appreciated!
System.Exception: Type ‘System.IO.DirectoryInfo’ cannot be serialized. Consider marking it with the DataContractAttribute attribute, and marking all of its members you want serialized with the DataMemberAttribute attribute. Alternatively, you can ensure that the type is public and has a parameterless constructor - all public members of the type will then be serialized, and no attributes will be required.
When a job gets suspended all the variables and arguments currently in memory get serialized and sent to the Orchestrator. This is so the job can restart later, when the variables data gets sent back and is deserialized and it resumes where it was ‘bookmarked’.
This comes with limitations though, for example to activity that is suspending (or persisting) must be on the ‘Entry Point’ workflow (this is typically the Main.xaml), the other limitation is that all the variables are serializable. Its only the variables you can ‘see’ when the job suspends, so basically when you click on the Wait task, see what variables appear in the variables pane.
It sounds like you have a DirectoryInfo type variable available at that point and thats what it is upset about as it cannot serialize it.
I feel like if I recall correctly that DirectoryInfo should be serializable, but Microsoft messed up and didnt set the flag for it and never fixed it. Regardless, you need to get that moved, if possible change the scope of it so its only used in the part of the project its needed in. If you need it after the wait task you need to do some jiggery pokery and instead keep the DirectoryInfo variable in a tight scope and only store the file path as a string on a higher scope and then make a new DirectoryInfo later in the process when you need it again.