UiPath Document Understanding - Build a Document Understanding Automation in Studio - Additional exercise

Hi, everyone. I’m currently working on my process to extract the data from the credit card application. I have a question related to the checkbox fields, such as these:
image
I understand how to extract each checkbox separately; that is what the folks above have done in their projects, assigning each checkbox to a field and making them Booleans.
image
Using those fields, we can determine what option the applicant is choosing. For example, if Domestic = Yes and International = No, we have a valid request for a domestic card type.

However, my question is: Is there a better way to do this where we can determine what option is selected without creating a field for every choice and using logic to determine the chosen one? Is this something we can do with Machine Learning?

The documentation here (screenshot below) suggests this is possible, but doesn’t go into detail or have an example of how to do this. Just wondering if this is an approach we can take.
Thanks!

EDIT: I may have gotten ahead of myself. This appears to be what is taught in the next course within the learning plan.

2 Likes

Hi,

In my case, I can just manually classify documents, but I don’t know how to set up classifier to do it automatically.
I didn’t correct any values in Present Validation Station, when checking the extraction results. I know that Present Validation Station is used to correct extracted data, but I just wanted to extract the data as they are.
And for outputFilePath I have used inputFilePath.Split(“\”)(2).Split(“.”)(0) string just to extract a name of a input pdf (FileName variable did not worked for me).
So, I hope this is an interesting example of string splitting :slight_smile:
Here is my workflow:
DU_Practice_v1.zip (140.9 KB)
Note: to test it, copy input files in the Input folder

cheers

1 Like

DU-Challenge-Upload.zip (9.9 MB)

Hi Guys,
I have completed the exercise.

My question is how to handle the multiple check boxes effectively. Should I create a Boolean field for each checkbox respectively? Then I need to add many fields in Taxonomy.

Could you show me best practice to handle this with DU?

image

image

another challenges:

image

image

image

image

2 Likes

Hi,

Its a great question. yes… It is possible when it comes to using Machine Learning or Forms AI. However, if you have to use the Form Extractor, you have to stick with the field per checkbox approach.

2 Likes

Hello @Bjyen

There are multiple ways to do this.

  1. Use separate fields for each checkbox
  2. Use a single field for a specific checkbox set that belongs to a specific type
  3. Use a single field with multi-value.

However, these options are available when you are using ML extractor (models trained on Document Manager) or Forms AI. The approach we need to take may depend on what you need to extract, the documents available for each option, etc.

In our case here, we have to use the Form Extractor. If using Form Extractor, we have to stick with one field per checkbox. But of course, you can try the other methods by training a model with more sample documents.

In addition:
I also reviewed your solution. It looks great. Nice work building the workflow. Just a thought here for your future improvements.
You can try using a Parallel For Each to loop through the classification results. This approach helps process all types parallel making it mroe efficient. :slight_smile:

3 Likes

Hi @StefanP

Nice to see your thoughts and different approaches you have tried. I also looked into your workflow. You have built it well. I just noticed one missing part in the Classification step. Your Classify Document Scope is configured properly. However, the Present Classification Station is missing one of its inputs: Automatic Classification Results.

This is the output that is coming from the Classify Document Scope. The output of the Classify Document Scope contains the results of the classification done through the Intelligent Keyword Classifier. I think this is probably why you didn’t see the classifier happening automatically. However, it got into Validation Station because you passed the classification results that gets generated from the Classification Validation step.

In general, your workflow, the configurations, etc looks good. You can also think about using a Parallel For Each to loop through classification results.

Note: You can access the classification type using the Classification Result variable in the For Each. You can use it like
docClassificationResult.DocumentTypeID.ToString

Hope this helps…

Let me know if it is not clear…

Thanks
Lahiru

3 Likes

Hi Lahiru,
Thank you so much for giving me solutions for handling multiple checkboxes. Now I have learned ML extractor / AI-Form, I will try it out.

I realized that Parallel for Each should be used for data extraction that processes multiple classification results. Thank you again. :grinning:

2 Likes

Hi @Melisa_Miranda @Lahiru.Fernando

Please find the attached workflow. Waiting for your feedback.

Thanks
Adharsh Chandran
DU_UiPathForumChallenge.zip (222.5 KB)

1 Like

Hello @ACJS

Sorry for my late reply on this. I reviewed your solution. You did well. However, you can do better :slight_smile:

Here are a few areas that I identified for further improvement.

Document Understanding Flow:

I like your effort to include the Document Understanding activities inside a REFramework solution. However, this is not the best practice. The REFramework intends to handle transactional processes. Document processing solutions are not usually transactional due to several reasons:

  • Transactional jobs always run in a sequential order (just like a loop). It is not ideal for document processing. Document processing jobs usually run in parallel (the best scenario is to create one job per document). You will learn this in the DU Template lesson. For this training, we can do a basic workflow instead of REF. You will learn the best practice of using the DU template later in the training.

  • Taxonomy design is good. Also think about Date format configurations as an additional thing to improve the accuracy

  • You did good with the classification steps.

  • The data extraction scope requires some minor changes. Form Extractor supports both computer-generated text and handwriting. This also gives you the chance to use Form Extractor for signature detection and extraction instead of Intelligent Form Extractor. It is also important to note that Intelligent Form Extractor is not deprecated. So better not to use it. You can customize your extraction part based on these points.

Feel free to do the changes and resubmit. Happy to assist with further questions/ queries if you have anytime…

Have a good day!

1 Like

Hi @Lahiru.Fernando,

Thanks for your time to give me the feedback. I will rework on my existing project with the above suggestions you have given.

Thanks
Adharsh

Hi there,

Please see my attached workflow for this challenge.
Practice_DocumentUnderstanding.zip (136.0 KB)

The main issue I experienced during this challenge was with the Credit Card Application forms. Since the form uses square boxes for the characters, the model would sometimes interpret these as ☐ characters and includes them in the extraction.
Is there any workaround for this please?

Looking forward to your feedback!

Thanks
Beth

Hello @bethj

Thanks for submitting the work.
I had a look at the workflows and it looks great. I also have feedback if you don’t mind :slight_smile:

Here are a few best practices that you could follow:

  • Wrapping the Present Validation/ Classification Station activities with a Try Catch.
  • The ideal approach to loop through the Classification Results is using a Parallel For Each. The reason is that it will process different classification results parallel to be more efficient.

Regarding the boxes that you get when extracting:
This sometimes happen when processing forms that has similar structures. What we can do is to replace those using String.Replace functions. Another challenge could be having extra spaces between characters. We can apply a similar technique these as well.

What’s great about this is: You can actually do these updates in the ExtractionResults variable itself. This way, you can send the corrected and clean values to Action Center/ Validation Station when doing a manual review. The same clean values can go for export as well.

Hope this helps!

1 Like

Thanks so much for your feedback Lahiru.
I’ll look at implementing those changes.

Thanks
Beth

Hello,

I have created a workflow for this challenge and taken some of the UiPath DU Template Workflows and structure. I tried to keep as much of the UiPath DU structure without actually needing everything. I kept it for learning purposes.

The process is working mostly fine.

DocumentUnderstandingProcess.7z (9,4 MB)

Because Intelligent Keyword Classifier can handle multiple pages I did not split the PDFs. All values from the 1st page “Account Opening Form” are found, but values from the 2nd page “Know Your Client” are not found.

Can you tell me why?

Regards,
Davis