Labelling in Document understanding

AbarnaKalaiselvam · February 5, 2024, 5:00pm

I want to extract the filed Elements, Minimum, Maximum and actual.
while labelling data it extracts all the data in same column… how to extract as a separate row

Anil_G · February 5, 2024, 6:48pm

@AbarnaKalaiselvam

If its table then you need to indicate the separate columns also…

Do you need element as separate column and all? If ao it might not be extracting like that

Cheers

jose.ordonez1 · February 5, 2024, 7:16pm

Hi Abarna,
Please check the following link about labeling Documents in Document Manager.
Link: Document Understanding - Label Documents

Cheers!

AbarnaKalaiselvam · February 6, 2024, 5:25am

@Anil_G

I need to extract the elements under the column in different row(one item for a row)
likewise the other.
but it group in to single cell.

Anil_G · February 6, 2024, 5:30am

@AbarnaKalaiselvam

Thats because you defined element as a column…

You need to define the actual columns…then you would get eqch in different column and element in one row etc

Cheers

srinivasmarneni · February 6, 2024, 5:33am

Hi, Follow below steps

To extract data as separate rows from a structured document like the one you’re describing, you can use UiPath Document Understanding. Here’s how to approach the labeling process:

Step 1: Define Your Extraction Schema

Open the Document Understanding ML Extractor Trainer.
Define your data schema to include the fields you want to extract: Elements, Minimum, Maximum, and Actual.

Step 2: Digitize the Document

Use the ‘Digitize Document’ activity to convert the PDF into a machine-readable format.

Step 3: Present Validation Station

Use the ‘Present Validation Station’ activity to manually validate and correct the extracted information.
During validation, ensure that each value is labeled correctly according to the schema you’ve defined.

Step 4: Labeling for Separate Rows

When labeling, if the values are extracted in the same column but need to be separate rows, you should label each value individually.
Depending on how the Document Understanding framework processes your document, you might need to adjust the data definition for each field to specify that the values should be captured as separate items.

Step 5: Train Your Model

After labeling, train your model with the labeled data. This helps the ML model to learn the correct structure and improve extraction accuracy.

Step 6: Apply Trained Model

Once trained, apply the trained ML model to extract data from similar documents using the ‘Machine Learning Extractor’ activity.

Step 7: Data Extraction for Separate Rows

Use the ‘Data Extraction Scope’ activity with the trained ML model to extract the data from new documents.
The ‘ExtractedDataTable’ variable will hold the extracted data where each row corresponds to one set of Elements, Minimum, Maximum, and Actual.

Step 8: Review and Adjust

After extraction, review the results. If the data is not being extracted as separate rows, you may need to go back and adjust the labeling or the schema definition.

AbarnaKalaiselvam · February 6, 2024, 6:11am

@Anil_G ,

yes, I’m defining Elements as column, bcz that is my header.
Elements items are changed, I cant define that as column name.
Is there any solution for my scenario

Topic		Replies	Views
How to extract the table using AI center AI Center question , ai_center	1	223	February 16, 2024
Labelling a table data AI Center question , ai_center	6	550	February 8, 2024
Ai center data labelling by multiple people in parallel AI Center document_understanding	2	575	November 9, 2022
Extract multiple tables using ML Extractor AI Center question , ai_center	2	993	March 27, 2023
Lable dataset in ai center AI Center question , ai_center	1	58	July 2, 2024