UiPath Document Understanding - Build a Document Understanding Automation in Studio - Additional exercise

Joe_Matuch · January 26, 2024, 6:16pm

Hi, everyone. I’m currently working on my process to extract the data from the credit card application. I have a question related to the checkbox fields, such as these:

I understand how to extract each checkbox separately; that is what the folks above have done in their projects, assigning each checkbox to a field and making them Booleans.

Using those fields, we can determine what option the applicant is choosing. For example, if Domestic = Yes and International = No, we have a valid request for a domestic card type.

However, my question is: Is there a better way to do this where we can determine what option is selected without creating a field for every choice and using logic to determine the chosen one? Is this something we can do with Machine Learning?

The documentation here (screenshot below) suggests this is possible, but doesn’t go into detail or have an example of how to do this. Just wondering if this is an approach we can take.
Thanks!

EDIT: I may have gotten ahead of myself. This appears to be what is taught in the next course within the learning plan.

StefanP · February 6, 2024, 10:53am

Hi,

In my case, I can just manually classify documents, but I don’t know how to set up classifier to do it automatically.
I didn’t correct any values in Present Validation Station, when checking the extraction results. I know that Present Validation Station is used to correct extracted data, but I just wanted to extract the data as they are.
And for outputFilePath I have used inputFilePath.Split(“\”)(2).Split(“.”)(0) string just to extract a name of a input pdf (FileName variable did not worked for me).
So, I hope this is an interesting example of string splitting
Here is my workflow:
DU_Practice_v1.zip (140.9 KB)
Note: to test it, copy input files in the Input folder

cheers

Bjyen · February 18, 2024, 11:29am

DU-Challenge-Upload.zip (9.9 MB)

Hi Guys,
I have completed the exercise.

My question is how to handle the multiple check boxes effectively. Should I create a Boolean field for each checkbox respectively? Then I need to add many fields in Taxonomy.

Could you show me best practice to handle this with DU?

another challenges:

Lahiru.Fernando · February 23, 2024, 10:18am

Hi,

Its a great question. yes… It is possible when it comes to using Machine Learning or Forms AI. However, if you have to use the Form Extractor, you have to stick with the field per checkbox approach.

Lahiru.Fernando · February 23, 2024, 10:42am

Hello @Bjyen

There are multiple ways to do this.

Use separate fields for each checkbox
Use a single field for a specific checkbox set that belongs to a specific type
Use a single field with multi-value.

However, these options are available when you are using ML extractor (models trained on Document Manager) or Forms AI. The approach we need to take may depend on what you need to extract, the documents available for each option, etc.

In our case here, we have to use the Form Extractor. If using Form Extractor, we have to stick with one field per checkbox. But of course, you can try the other methods by training a model with more sample documents.

In addition:
I also reviewed your solution. It looks great. Nice work building the workflow. Just a thought here for your future improvements.
You can try using a Parallel For Each to loop through the classification results. This approach helps process all types parallel making it mroe efficient.

Lahiru.Fernando · February 23, 2024, 11:13am

Hi @StefanP

Nice to see your thoughts and different approaches you have tried. I also looked into your workflow. You have built it well. I just noticed one missing part in the Classification step. Your Classify Document Scope is configured properly. However, the Present Classification Station is missing one of its inputs: Automatic Classification Results.

This is the output that is coming from the Classify Document Scope. The output of the Classify Document Scope contains the results of the classification done through the Intelligent Keyword Classifier. I think this is probably why you didn’t see the classifier happening automatically. However, it got into Validation Station because you passed the classification results that gets generated from the Classification Validation step.

In general, your workflow, the configurations, etc looks good. You can also think about using a Parallel For Each to loop through classification results.

Note: You can access the classification type using the Classification Result variable in the For Each. You can use it like
docClassificationResult.DocumentTypeID.ToString

Hope this helps…

Let me know if it is not clear…

Thanks
Lahiru

Bjyen · March 2, 2024, 8:44am

Hi Lahiru,
Thank you so much for giving me solutions for handling multiple checkboxes. Now I have learned ML extractor / AI-Form, I will try it out.

I realized that Parallel for Each should be used for data extraction that processes multiple classification results. Thank you again.

ACJS · March 8, 2024, 7:45am

Hi @Melisa_Miranda @Lahiru.Fernando

Please find the attached workflow. Waiting for your feedback.

Thanks
Adharsh Chandran
DU_UiPathForumChallenge.zip (222.5 KB)

Lahiru.Fernando · March 25, 2024, 3:12pm

Hello @ACJS

Sorry for my late reply on this. I reviewed your solution. You did well. However, you can do better

Here are a few areas that I identified for further improvement.

Document Understanding Flow:

I like your effort to include the Document Understanding activities inside a REFramework solution. However, this is not the best practice. The REFramework intends to handle transactional processes. Document processing solutions are not usually transactional due to several reasons:

Transactional jobs always run in a sequential order (just like a loop). It is not ideal for document processing. Document processing jobs usually run in parallel (the best scenario is to create one job per document). You will learn this in the DU Template lesson. For this training, we can do a basic workflow instead of REF. You will learn the best practice of using the DU template later in the training.
Taxonomy design is good. Also think about Date format configurations as an additional thing to improve the accuracy
You did good with the classification steps.
The data extraction scope requires some minor changes. Form Extractor supports both computer-generated text and handwriting. This also gives you the chance to use Form Extractor for signature detection and extraction instead of Intelligent Form Extractor. It is also important to note that Intelligent Form Extractor is not deprecated. So better not to use it. You can customize your extraction part based on these points.

Feel free to do the changes and resubmit. Happy to assist with further questions/ queries if you have anytime…

Have a good day!

ACJS · March 26, 2024, 3:13pm

Hi @Lahiru.Fernando,

Thanks for your time to give me the feedback. I will rework on my existing project with the above suggestions you have given.

Thanks
Adharsh

bethj · May 1, 2024, 11:56am

Hi there,

Please see my attached workflow for this challenge.
Practice_DocumentUnderstanding.zip (136.0 KB)

The main issue I experienced during this challenge was with the Credit Card Application forms. Since the form uses square boxes for the characters, the model would sometimes interpret these as ☐ characters and includes them in the extraction.
Is there any workaround for this please?

Looking forward to your feedback!

Thanks
Beth

Lahiru.Fernando · May 15, 2024, 12:33pm

Hello @bethj

Thanks for submitting the work.
I had a look at the workflows and it looks great. I also have feedback if you don’t mind

Here are a few best practices that you could follow:

Wrapping the Present Validation/ Classification Station activities with a Try Catch.
The ideal approach to loop through the Classification Results is using a Parallel For Each. The reason is that it will process different classification results parallel to be more efficient.

Regarding the boxes that you get when extracting:
This sometimes happen when processing forms that has similar structures. What we can do is to replace those using String.Replace functions. Another challenge could be having extra spaces between characters. We can apply a similar technique these as well.

What’s great about this is: You can actually do these updates in the ExtractionResults variable itself. This way, you can send the corrected and clean values to Action Center/ Validation Station when doing a manual review. The same clean values can go for export as well.

Hope this helps!

bethj · May 15, 2024, 12:51pm

Thanks so much for your feedback Lahiru.
I’ll look at implementing those changes.

Thanks
Beth

davis.bruvers · May 23, 2024, 10:09am

Hello,

I have created a workflow for this challenge and taken some of the UiPath DU Template Workflows and structure. I tried to keep as much of the UiPath DU structure without actually needing everything. I kept it for learning purposes.

The process is working mostly fine.

DocumentUnderstandingProcess.7z (9,4 MB)

Because Intelligent Keyword Classifier can handle multiple pages I did not split the PDFs. All values from the 1st page “Account Opening Form” are found, but values from the 2nd page “Know Your Client” are not found.

Can you tell me why?

Regards,
Davis

Lahiru.Fernando · June 12, 2024, 2:15pm

Hello @davis.bruvers

I’m trying to review the solution files you attached. However, I’m not able to extract it. Is it possible for you to send the files in the .zip format please?

thanks
Lahiru

van-der-kamp · June 21, 2024, 12:09pm

Hi @Lahiru.Fernando! Are you still involved with UiPath? Are you still reviewing challenges? There isn’t anybody else who is capable doing reviews?

This spot (as well as the spot related to the next on module ’The Document Understanding Process Template in Studio’ seems to be another ‘dead-end’ gorge!

Waited for almost a month, no response or whatsoever!

davis.bruvers · June 26, 2024, 5:51am

Hallo Lahiru,
here the Process compressed in zip Format.

DocumentUnderstandingProcess.zip (9,0 MB)

Thanks,
Davis

jaspearson · August 2, 2024, 7:39pm

The REFramework was designed to allow you to run multiple jobs of the same process simultaneously on separate machines. It is true if you only ran 1 job on 1 machine it would process the documents in the order they were loaded into the queue, but if you launched multiple jobs on multiple machines, an implementation with the REFramework would process them in parallel effectively.

Are you saying the method you are talking about allows you to run multiple jobs on the same machine?

PRAVEEN_KUMAR_L_K · August 7, 2024, 3:50am

Hi @Melisa_Miranda , @Lahiru.Fernando ,

I have completed the challenges, but I am facing issues with the account opening and KYC parts during classification. Please review my code and assist me. give me idea on table extraction and how to get " Customer Name" for final output, Thank you in advance. Attached is my code.

dispatcher
RoboticEnterpriseFramework_DU_dispatcher.zip (7.0 MB)
step 2 performer
DocumentUnderstandingChallenge_Course.zip (7.0 MB)

van-der-kamp · August 9, 2024, 8:01am

Hi Praveen_Kumar_L_K,

Please be welcome at this spot! Don’t be disappointed, there isn’t much activity (actually none) in this room, unfortunately! Both, @Melisa_Miranda and @Lahiru.Fernando don’t respond.

Keep going being busy with the good work, have fun with what your doing!

Most Active Users - Yesterday
ashokkarale
Yoichi
Anil_G
sonaliaggarwal47
sven.wullum1
Hiba_B
Vincent_Nuestro
Bot_Psychology
A_Learner
Sairam_RPA
More details...