How to Data Extract From a Scanned Document

basil_aiqmis · November 17, 2020, 9:14am

Multiple documents in a Scanned PDF to extract data. Each document has its own template[ Structure]. Sometimes little changes may affect.
As it is a scanned document, it’s template data-position may change.

How to Extract Data, from these situations?

We Tried these methods,

Form Extractor
intelligent OCR
ML Extractor
Regex

We couldn’t get a better result. Anyone Suggest any new method to extract data from this PDF File.

Any AI/ML methods?

The Main Problem Facing in data-position change. Data fields have specific labels to fetch the data.

How to collect actual data from Scanned PDF Using UiPath?

Thanks and Regards

Parth_Doshi · November 17, 2020, 9:43am

Is the scanned PDF an Invoice? if yes then ML Extractor can handle changing positions @basil_aiqmis

basil_aiqmis · November 17, 2020, 11:24am

Hi @Parth_Doshi,
it contains not only invoices and another thing is ML Extractor provides limited values.

How to add more custom values.

Parth_Doshi · November 17, 2020, 11:48am

To add custom fields you have to create ML Models in AI Fabric and then use them.
Not sure how to do it I think @Lahiru.Fernando or @nisargkadam23 can help you.

basil_aiqmis · November 17, 2020, 11:59am

Hi @Lahiru.Fernando & @nisargkadam23,

can you help me to solve this,

Thank you Mr. @Parth_Doshi. I can’t access AI Fabric. That’s not in my UiPath Dashboard.
Also while I’m installing that MLService[ML Skill] in my UiPath, 3 errors were showing in my output panel.

Lahiru.Fernando · November 17, 2020, 2:09pm

Hi Guys @Parth_Doshi @basil_aiqmis

Just saw the discussion here…
if you are processing invoices, yes we can create AI models with custom fields from what we have in AI Fabric.

Don’t worry on creating custom AI models. Use the out-of-the-box packages for Document Understanding, and you will see a package as follows.

Sorry for terrible handwriting on-screen

The one I highlighted can be used to train and use for custom fields as you need. You got to use Data Manager for this.

If you are using community or trial versions, sign-in to Insider program and request access to cloud Data Manager which comes in as a part of AI-Fabric once the request is accepted.

Insider program
http://insider.uipath.com/

Topic		Replies	Views
Issue in Table data extraction using Document understanding Activities orchestrator , activities , document_understanding	8	1699	May 20, 2022
How to use and train custom ML model in Document Understanding Help activities , question , document_understanding	8	3417	May 15, 2021
How to extract required information's from different type of PDF invoices? Activities ocr , activities , question	3	910	July 8, 2021
Best solution for reading scanned invoices with hundreds of different structures Studio uiautomation , ocr , intelligent_ocr , invoices	6	1646	June 22, 2022
Extract cerrtain data from pdf pod (proof of delivery invoice) Studio	4	450	February 27, 2023

How to Data Extract From a Scanned Document

How to collect actual data from Scanned PDF Using UiPath?

Related topics