Document Understanding: Document Splitting and Other Wonderful Stories :)

GNISHI · July 15, 2020, 2:04am

Hello!
I’m start using Document Understanding. It’s awesome! But had a doubt on a specific scenario:
If I have 1 scanned PDF that contains 2 different Docum Types on It from which I would like to classify them and then, apply 2 different Form Extractor (for example) - What shoudl be logic workflow?
What will be the best approach on using the Document Splitting? (What activities should be comibned and how?).
Hope you can shed some light on It.

Thanks!
Gaston.-

tudor.serban · July 15, 2020, 7:13am

Hi @GNISH, I’m not sure how familiar you are with DU, but the general steps would be:

Use Digitize Document to obtain the text and DOM.
Use the previous results in a Classify Document Scope with Intelligent Keyword Classifier. You can use the design time wizard (Manage Learning) from the Intelligent Keyword Classifier to do some preliminary training so that it knows what each of the document types looks like.
Use the classification results from step 2 in a Data Extraction Scope.

GNISHI · July 16, 2020, 7:20pm

Hi @tudor.serban, Thanks for the reply and suggestion! It was very useful.
I tried with the Intelligent Keyword Classifier and It worked with some additional actions:

I couldn’t use the “raw” PDF for the preliminary training as the original PDF contained both DocumentTypesId that I’m looking for… So, I had to split It in order to do pass It to the Intelligent Keyword Classifier for training.
Then, when I process the original PDF, It was able to classify and split the PDF into 2 separate DocumentTypeId, and be ready for the Form Template extractor.

tudor.serban · July 16, 2020, 7:59pm

@GNISHI: Glad to hear that. Alternatively, you could still use the original document for training without splitting it in the following way: digitize the document and then use the Present Classification Station activity to select the page ranges and corresponding document types. Save the result and pass it to a Train Classifiers Scope with Intelligent Keyword Classifier Trainer. You can then classify and split subsequent documents after this point.

sahilwadhwa100 · July 22, 2020, 10:58am

That was great , i did a POC on that

SWATI_KAROT · July 28, 2020, 9:12am

Hi, Just implemented IntelligentOCR and Document Understanding for classification and extraction. it is amazing. Still i have a question though. Can we use it to extract data from a table in PDF? what if the table is nested table ?

Ioana_Gligan · July 29, 2020, 4:02am

Hello @SWATI_KAROT,

All extractors have table extraction capabilities. Try them out.

We do not currently support nested tables or “repeating grouped fields” (like groups of field 1, table 2, another_field 3, that can appear multiple times in a document).

Ioana

SWATI_KAROT · July 29, 2020, 4:55am

Hi Ioana,

Thank you so much for your reply.I tried on nested tables yesterday, hence, it didn’t work. Let me try on basic tables first.

Thanks,
Swati Karot

viorel.balaj · July 29, 2020, 5:01am

Hello,

I have a problem with AI Fabric and Data Manager.
Could someone help me with configuration for Data Manager. I wanted to use the docker container but I need a login first (in documentation I found “registry credentials” https://docs.uipath.com/ai-fabric/v2020.4/docs/about-data-manager. I am not sure what it represents)

Thank you,
Viorel

SWATI_KAROT · July 29, 2020, 6:25am

Hi Ioana,

Please find attached the template configured for table extraction. All the custom selection and table highlight is clearly visible.

But during runtime, in validation station, the table is not getting extracted. Please find attached the same.

Any suggestions?

Thanks,
Swati.

Ioana_Gligan · July 29, 2020, 1:06pm

Hello @viorel.balaj, and welcome to our community

Please reach out to your UiPath contact for obtaining credentials and all the necessary information about DataManager.

Ioana

Ioana_Gligan · July 29, 2020, 1:08pm

Hello Swati,

Please check that you have used the “Configure Extractors” and that your field is checked.

SWATI_KAROT · July 29, 2020, 2:30pm

Wow! That was bang on.
Missed the check boxes. Table extracting fine now. Thanks a lot.

Marco_Alban_Hidalgo · August 4, 2020, 2:31pm

Hi! Just wanted to clarify for everyone on this post, the link to the documentation of Data Manager is now here:
https://docs.uipath.com/ai-fabric/v2020.4/docs/about-data-manager

Marco

Kesavaraj_K · August 8, 2020, 9:32am

Can anyone help me with this thread?

Varshini_Ganapathy_Subram · August 19, 2020, 8:44am

Hi Kesavraj,

Iam also facing the same issue while using “Create Document Validation Action” in the work flow". I have provided the same values you have provided in the screenshot.

If you have already found an solution could you please help me on this.
If you haven’t found an solution could anyone in the forum help on this

SWATI_KAROT · August 20, 2020, 5:03am

@Ioana_Gligan, I had a question on document understanding framework. Can it extract images from PDF and a text embedded inside the image? Like an image of a stamp on a PDF, and the stamp is containing some text. Is it capable of extracting such things?

Ioana_Gligan · August 20, 2020, 12:29pm

What version of Studio are you using? Have you checked that your project “Supports Persistence” in the Project Settings?
Are you using the Cloud Orchestrator?

Lahiru.Fernando · August 20, 2020, 1:11pm

Hello @Kesavaraj_K @Varshini_Ganapathy_Subram

Try giving the folder paths as I have done here in the screenshot below. This should work for you.

Varshini_Ganapathy_Subram · August 20, 2020, 3:28pm

Hi Lahiru.Fernando,

Thanks for your response. I tried with the same folder name as present in the screenshot. But still I’m getting the same error. Could you suggest me another way to solve this error

Topic		Replies	Views
Document Understanding: ML Classification Splitting Document Documentation studio , question	0	1339	January 15, 2022
How to split a file into individual document types in Document Understanding? Activities activities , question , document_understanding , classifier , intelligent-keyword-classifier	5	722	February 17, 2024
Extract and create different documents from one document Document Understanding activities , question	1	464	July 28, 2023
How to use Splitting option from the intelligent keyword classifier activity Activities activities , question , document_understanding	55	1375	August 29, 2023
Intelligent Keyword classifier- Document Understanding Studio orchestrator , robot , studio , question , tools	1	1186	December 31, 2021

Document Understanding: Document Splitting and Other Wonderful Stories :)

Related topics