Document Understanding - Form Extarction Templates

Hi Fellow Developers!

This Thread is regarding the Form Extraction Templates

I’m having 10 Forms which are fundamentally the same (US Forms) with the same Taxonomy fields

But some fields are differently positioned in each document which makes it become a separate template

But all documents have the same keywords but different templates, how to handle that? How do we Classify them as different templates?

Hi @Palaniyappan,

Can you help me with this issue? Thanks in Advance :smile:

@Kesavaraj_K … In the manage template session, you can define your form name as template name. Ex: (if your category is Claims ClaimFormABC, ClaimForm123 and Your category is GroupBenefits GBClaimForm123, GBClaimFormABC).

Couldn’t get your response @prasath17

I have created different templates in same groups but i couldn’t classify it and it always goes through the first template in the group.

Do you have a XAML file with such a solution? It will really great if you share the same

Thanks in Advance :smile:

@Kesavaraj_K… Sorry I don’t have xaml for your case…

In that case, you have to make each form as different document type…

I couldn’t do that because all the keywords are present in both documents, Only position of fields are held differently!

Hi @Kesavaraj_K: Can you classify them first and then extract data independently from each type?

Hi @Kesavaraj_K,

If you can differentiate forms based on their files names, you can just skip classification operations and

  1. Decide Document Type based on File Name.

  2. Instead of Classification Result property, Use Document Type ID property of Data Extraction Scope activity.

Thank you,
@poorna_nayak07

Hi Guys, Sorry for the Late reply

From my understanding, Many of you guys misunderstood the Classification Scope part of DU which I will explain.

PLS BARE WITH ME FOR 5 MINUTES! :sweat_smile:

We got a manual solution but effective in a short timeline!

Scenario:

We had 10 documents which are Promissory Note for Loan Approval

These documents are fixed templated documents which we assumed the position of the fields also will be the same. IT WAS NOT!


With the given samples you can notice that the fields that need to be extracted are differently positioned (Eg: Loan Number, MIN Number, etc.,)

Using DU, we defined Taxonomy, Digitization as it should be (Note: We didn’t Classify the document because all the keywords in the document are BASICALLY SAME!)

We reached a point where we need to decide the extraction method. In the Data Extraction Scope, There four major things were available.

  1. ML Extractor:
    This extraction method requires a Predefined ML Model which is not present or We had to create a custom ML Model which requires time and there were only 10 documents so NO!

  2. Intelligent Form Extractor:
    Since we didn’t need any handwritten fields to be extracted. NO!

Let’s get to the obvious methods. Regex Extractor and Form Extractor

Most of the fields don’t have a pattern like (Eg. Name, Address, and EVEN Loan Number!)

So the Most obvious would be FORM EXTRACTION

Since FORM EXTRACTION is basically Position based approach we couldn’t determine the Template that needs to be created.

After a lot of research and the Trail & error method, we found that we can CREATE MULTIPLE TEMPLATES for the SAME DOCUMENT TYPE ID!

So we had to create 6-7 templates to compensate for the bad results

It takes some time to process the documents (took 2-4 mins in Data extraction scope) But works like a charm.

FOR LONG RUN this will not be a valid solution because with templates it may or may not extract the correct details.

This is where ML comes in!. We need to create a custom ML Model approach with AI Fabric, Data Manager, and DU Tools.

NOTE: We did create templates on different desktop and while exporting and sharing through OneDrive the same.

WE EXPERIENCED A ERROR WHILE IMPORTING and FOUND THAT ONEDRIVE COMPRESSION CAUSED THIS ERROR. GOOGLE DRIVE JUST WORKS FINE!!!

THANKS, GUYS FOR ALL THE SUPPORT!

@Lahiru.Fernando @prasath17 @poorna_nayak07 @tudor.serban

3 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.