Any demo video/tutorial available for Extract Semi-Structured Document Activity?
UiPath Go link :
https://go.uipath.com/component/extract-semi-structured-document-activity-5abc47
Any demo video/tutorial available for Extract Semi-Structured Document Activity?
UiPath Go link :
https://go.uipath.com/component/extract-semi-structured-document-activity-5abc47
Hi @Pankit
Maybe not a direct answer (as I didn’t use this activity), but for some context you could try this project here:
It is also extracting data from files and even though the activity package is different, the principle of processing a file based on taxonomy is there.
Hello,
You can find some example workflows here:
Let me know if you have any questions.
Regards
Thanks! I had a few queries :
Hi Antoine: Thanks for sharing the example. I am really new at UI Path, so sorry for the naïve question…for the URL, currently it says “https://invoice.uipath.com”…should I leave it as is or substitute my cloud orchestrator - UiPath or is it reference my desktop. Also similar question for the 2nd parameter - UiPath.MLModel.Invoice - is that a package that I install on my community edition environment?
Hello ridf,
Regarding the URL, you can leave it as is. (cf this post for more details Receipt and Invoice AI - Now available in Public Preview! )
Regarding the DocType parameter, the values you’re seeing in Studio are taken from the taxonomy.json file that’s in the DocumentProcessing folder of the example project. In your own project, you can generate this file automatically by clicking the Build Taxonomy button on the activity. Doing so will prompt you for a URL (https://invoices.uipath.com for example) and a Group, a Category and a DocumentType. These three last fields are used to generate the ID you set in the DocType parameter of the activity (so in the example, UiPath.Model.Invoice was constructed with Group = UiPath, Category= Model, DocumentType = Invoice). You’re free to choose how you want to name your document type, just be careful that you can’t have duplicates. You can use the TaxonomyManager in Studio to edit/remove document types if needed.
Thanks @Antoine_Chaussin, the activity is working absolutely fine, but
the results are not just as desired. I have attached one of the invoice, if
you compare the attached result with that then there are some mistakes
like the description and the total amount, could you please tell me how to
solve this and achieve the desired result.
If going forward if I want more details from the invoice like Invoice Number,
Order Number, & Invoice Date etc. then how can I get that. I would be thankful
& glad to receive your reply.
thanks
Invoice & Excel Output results both are attached.
wordpress.pdf (42.6 KB)
Hi @syed1980
This kind of issue can only be corrected by updating/retraining the models. Unfortunately, this is not something that’s available on the community models, the only workaround I can propose here would be to use validation station to correct the erroneous data manually.
Regarding getting more information, the latest version of the model should include at least some of the fields you mentioned. To check out the latest list of fields, you can use the Build Taxonomy button in the activity. This will create a new document type with all the latest fields available in the model. To check them, you can either use the Taxonomy Manager in Studio, or run an example with Validation Station to see how fields are mapped in your document.
Let me know if anything is unclear or if you have further questions.
Hello @Antoine_Chaussin
Thank for your reply and I appreciate the same.
So basically you spoke about 2 different things here, first is VALIDATION STATION
& TAXONOMY MANAGER, my question is these both things are available in the
community edition and if yes then with these I can get the required results?
Yes, Antoine, I would definitely require your help because am just new to RPA and
still learning, and I would be glad and thankful to receive the same.
Thanks & Cheers.
Validation station and the taxonomy manager are both included in the UiPath.IntelligentOCR.Activities package. It’s available in the community edition but you need to install it from the Package Manager in Studio. Validation station will let you visualize the results from the activity, so it should help you understand exactly what fields the model can detect (the example I put on Github don’t take advantage of all the fields the models can detect)
Thanks for the reply @Antoine,
I have downloaded and install everything as instructed. Successfully Run the program,
first I run this “MultipleInvoicesStagingTables” successfully but the results
are not as desired as I told you in the first message. The second I run
“ValidationStationTest” also successfully and changed the fields as per my
invoice.
Hi @syed1980
The custom dom example if for an experimental feature that hasn’t been released in the community version yet, so it’s expected that it fails. You don’t need to worry about it.
Regarding ValidationStation, the corrected results can be retrieved from the out argument of the Present Validation Station activity. You can use the Export Extraction Result activity to transform the output of PresentValidationStation to a friendlier format. To do so:
The Export Extraction Result activity will give you a Dataset object. The Dataset contains datatables you will contain your data.
Hey @Antoine_Chaussin How can i improve the skills like confidence for next documents(Assuming Same template) . That mean it should try to detect the fields … Is that possible ?
Hi @Vijay_RPA
Currently this is not possible as the models cannot be retrained on the fly. Stay tuned for announcements around the models though, it may become possible in a future release
It would be great if we can train the BOTS … i dont want to mention But other RPA application had that capability hope Uipath come with that
Hello, I am interested to use the ML extractor and have below questions:
Does the Enterprise license allow larger file sizes and more #page to process compared the community license limitation of 2 pages?
To doubly confirm the ML extractor is not beta or preview release which would limit me from using this within our development and production on-prem Orchestrator and/or attended/ un-attended robots?
Can the ML extractor activity be utilized for both attended and unattended robots?
The extracted fields for invoices via the ML extractor is limited, how will be know when new fields are added?
Does the training scope activity based off ML extractor (understand this isn’t currently do-able) will it require validation station when it can be trained?
Thanks,
A post was split to a new topic: Build Taxonomy error
hello @rinks,
first of all, please try the Machine Learning Extractor instead of the Extract Semi-Structured Document activity - the ML Extractor from the UiPath.DocumentUnderstanding.ML.Activities is the supported activity.
You can see a sample workflow here: How to use the IntelligentOCR Package (try the latest EDIT1 workflow).
Hope this helps,
Ioana