How to train machine learning extractor in document understanding

First Question::

we need to extract table data from invoices. Table data fields are Line, Item No
Quantity, UOM,Unit Price,Freight Charge,Taxable, Line Total.

We have tried with purchase order end point , but could not find relevant fields in End points.

Even tried with form extractor and intelligent form extractor but table data was not extracted properly. Basically form extractor is not able to extract more rows in a table, if we had given one row template initially.
NOTE:: we don’t have access to the AI center(AI Fabric).

Second Question::
When we try to extract table data from invoice using ML extractor. Bot was not able to extract exact data like if 1/2 is there in invoice. Bot is reading 112.

Please help me out in this.
I request you to suggest any videos or any documents.

Regards
Anusha

look at the extracted/digitized text to determine if it was an ocr error (does the text read 1/2 or 112). You can use a break point after digitize to view the text. If it isn’t being digitized correctly try a different ocr engine

HI @anusha2,
Welcome to the community,

For your 1st question, yes Form Extractors aren’t up to the mark yet when it comes to extracting Tabular Data. You can go for Regex Extractors, if your tabular data more or less remains the same in most of your documents.

For your 2nd question, I would follow @corinna.robertson’s advice and check the output of the OCR activity (and change OCRs if needed)

Thanks,
Nishant

Thankyou for your quick reply.

I have tried with all the ocr’s but the output is same

Thankyou for your quick reply.

I have tried with all the ocr’s but the output is same.

Can you share any video of reg expression to extract table data.

1 Like

You can find DU videos of Lahiru Fernando which are great for learning the principles of DU.
For Regex Extractors, please check this video -

Thanks,
Nishant

Thankyou so much.

Will check it out

Great.
Kindly select my answer as the solution if you found what you were looking for. This will help others who face a similar issue in the future

Thanks,
Nishant

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.