Hi, everyone. I’m currently working on my process to extract the data from the credit card application. I have a question related to the checkbox fields, such as these:
I understand how to extract each checkbox separately; that is what the folks above have done in their projects, assigning each checkbox to a field and making them Booleans.
Using those fields, we can determine what option the applicant is choosing. For example, if Domestic = Yes and International = No, we have a valid request for a domestic card type.
However, my question is: Is there a better way to do this where we can determine what option is selected without creating a field for every choice and using logic to determine the chosen one?Is this something we can do with Machine Learning?
The documentation here (screenshot below) suggests this is possible, but doesn’t go into detail or have an example of how to do this. Just wondering if this is an approach we can take.
Thanks!
In my case, I can just manually classify documents, but I don’t know how to set up classifier to do it automatically.
I didn’t correct any values in Present Validation Station, when checking the extraction results. I know that Present Validation Station is used to correct extracted data, but I just wanted to extract the data as they are.
And for outputFilePath I have used inputFilePath.Split(“\”)(2).Split(“.”)(0) string just to extract a name of a input pdf (FileName variable did not worked for me).
So, I hope this is an interesting example of string splitting
Here is my workflow: DU_Practice_v1.zip (140.9 KB)
Note: to test it, copy input files in the Input folder
My question is how to handle the multiple check boxes effectively. Should I create a Boolean field for each checkbox respectively? Then I need to add many fields in Taxonomy.
Could you show me best practice to handle this with DU?
Its a great question. yes… It is possible when it comes to using Machine Learning or Forms AI. However, if you have to use the Form Extractor, you have to stick with the field per checkbox approach.
Use a single field for a specific checkbox set that belongs to a specific type
Use a single field with multi-value.
However, these options are available when you are using ML extractor (models trained on Document Manager) or Forms AI. The approach we need to take may depend on what you need to extract, the documents available for each option, etc.
In our case here, we have to use the Form Extractor. If using Form Extractor, we have to stick with one field per checkbox. But of course, you can try the other methods by training a model with more sample documents.
In addition:
I also reviewed your solution. It looks great. Nice work building the workflow. Just a thought here for your future improvements.
You can try using a Parallel For Each to loop through the classification results. This approach helps process all types parallel making it mroe efficient.
Nice to see your thoughts and different approaches you have tried. I also looked into your workflow. You have built it well. I just noticed one missing part in the Classification step. Your Classify Document Scope is configured properly. However, the Present Classification Station is missing one of its inputs: Automatic Classification Results.
This is the output that is coming from the Classify Document Scope. The output of the Classify Document Scope contains the results of the classification done through the Intelligent Keyword Classifier. I think this is probably why you didn’t see the classifier happening automatically. However, it got into Validation Station because you passed the classification results that gets generated from the Classification Validation step.
In general, your workflow, the configurations, etc looks good. You can also think about using a Parallel For Each to loop through classification results.
Note: You can access the classification type using the Classification Result variable in the For Each. You can use it like docClassificationResult.DocumentTypeID.ToString
Sorry for my late reply on this. I reviewed your solution. You did well. However, you can do better
Here are a few areas that I identified for further improvement.
Document Understanding Flow:
I like your effort to include the Document Understanding activities inside a REFramework solution. However, this is not the best practice. The REFramework intends to handle transactional processes. Document processing solutions are not usually transactional due to several reasons:
Transactional jobs always run in a sequential order (just like a loop). It is not ideal for document processing. Document processing jobs usually run in parallel (the best scenario is to create one job per document). You will learn this in the DU Template lesson. For this training, we can do a basic workflow instead of REF. You will learn the best practice of using the DU template later in the training.
Taxonomy design is good. Also think about Date format configurations as an additional thing to improve the accuracy
You did good with the classification steps.
The data extraction scope requires some minor changes. Form Extractor supports both computer-generated text and handwriting. This also gives you the chance to use Form Extractor for signature detection and extraction instead of Intelligent Form Extractor. It is also important to note that Intelligent Form Extractor is not deprecated. So better not to use it. You can customize your extraction part based on these points.
Feel free to do the changes and resubmit. Happy to assist with further questions/ queries if you have anytime…
The main issue I experienced during this challenge was with the Credit Card Application forms. Since the form uses square boxes for the characters, the model would sometimes interpret these as ☐ characters and includes them in the extraction.
Is there any workaround for this please?
Thanks for submitting the work.
I had a look at the workflows and it looks great. I also have feedback if you don’t mind
Here are a few best practices that you could follow:
Wrapping the Present Validation/ Classification Station activities with a Try Catch.
The ideal approach to loop through the Classification Results is using a Parallel For Each. The reason is that it will process different classification results parallel to be more efficient.
Regarding the boxes that you get when extracting:
This sometimes happen when processing forms that has similar structures. What we can do is to replace those using String.Replace functions. Another challenge could be having extra spaces between characters. We can apply a similar technique these as well.
What’s great about this is: You can actually do these updates in the ExtractionResults variable itself. This way, you can send the corrected and clean values to Action Center/ Validation Station when doing a manual review. The same clean values can go for export as well.
I have created a workflow for this challenge and taken some of the UiPath DU Template Workflows and structure. I tried to keep as much of the UiPath DU structure without actually needing everything. I kept it for learning purposes.
Because Intelligent Keyword Classifier can handle multiple pages I did not split the PDFs. All values from the 1st page “Account Opening Form” are found, but values from the 2nd page “Know Your Client” are not found.
I’m trying to review the solution files you attached. However, I’m not able to extract it. Is it possible for you to send the files in the .zip format please?
The REFramework was designed to allow you to run multiple jobs of the same process simultaneously on separate machines. It is true if you only ran 1 job on 1 machine it would process the documents in the order they were loaded into the queue, but if you launched multiple jobs on multiple machines, an implementation with the REFramework would process them in parallel effectively.
Are you saying the method you are talking about allows you to run multiple jobs on the same machine?
I have completed the challenges, but I am facing issues with the account opening and KYC parts during classification. Please review my code and assist me. give me idea on table extraction and how to get " Customer Name" for final output, Thank you in advance. Attached is my code.
Please be welcome at this spot! Don’t be disappointed, there isn’t much activity (actually none) in this room, unfortunately! Both, @Melisa_Miranda and @Lahiru.Fernando don’t respond.
Keep going being busy with the good work, have fun with what your doing!