Any demo video/tutorial available for Extract Semi-Structured Document Activity?

Any demo video/tutorial available for Extract Semi-Structured Document Activity?

UiPath Go link :
https://go.uipath.com/component/extract-semi-structured-document-activity-5abc47

@alexcabuz
@loginerror

@Antoine_Chaussin ?

Hi @Pankit

Maybe not a direct answer (as I didn’t use this activity), but for some context you could try this project here:

It is also extracting data from files and even though the activity package is different, the principle of processing a file based on taxonomy is there.

2 Likes

Hello,

You can find some example workflows here:

Let me know if you have any questions.

Regards

1 Like

Thanks! I had a few queries :

  1. When we are validating the data using intelligent OCR manually, are we going to use Attended automation? Will this process take place for each invoice?

Hi Antoine: Thanks for sharing the example. I am really new at UI Path, so sorry for the naïve question…for the URL, currently it says “https://invoice.uipath.com”…should I leave it as is or substitute my cloud orchestrator - UiPath or is it reference my desktop. Also similar question for the 2nd parameter - UiPath.MLModel.Invoice - is that a package that I install on my community edition environment?

Hello ridf,

Regarding the URL, you can leave it as is. (cf this post for more details Receipt and Invoice AI - Now available in Public Preview! )

Regarding the DocType parameter, the values you’re seeing in Studio are taken from the taxonomy.json file that’s in the DocumentProcessing folder of the example project. In your own project, you can generate this file automatically by clicking the Build Taxonomy button on the activity. Doing so will prompt you for a URL (https://invoices.uipath.com for example) and a Group, a Category and a DocumentType. These three last fields are used to generate the ID you set in the DocType parameter of the activity (so in the example, UiPath.Model.Invoice was constructed with Group = UiPath, Category= Model, DocumentType = Invoice). You’re free to choose how you want to name your document type, just be careful that you can’t have duplicates. You can use the TaxonomyManager in Studio to edit/remove document types if needed.

Thanks @Antoine_Chaussin, the activity is working absolutely fine, but
the results are not just as desired. I have attached one of the invoice, if
you compare the attached result with that then there are some mistakes
like the description and the total amount, could you please tell me how to
solve this and achieve the desired result.

If going forward if I want more details from the invoice like Invoice Number,
Order Number, & Invoice Date etc. then how can I get that. I would be thankful
& glad to receive your reply.

thanks

Invoice & Excel Output results both are attached. :arrow_double_down:
wordpress.pdf (42.6 KB)

Hi @syed1980

This kind of issue can only be corrected by updating/retraining the models. Unfortunately, this is not something that’s available on the community models, the only workaround I can propose here would be to use validation station to correct the erroneous data manually.

Regarding getting more information, the latest version of the model should include at least some of the fields you mentioned. To check out the latest list of fields, you can use the Build Taxonomy button in the activity. This will create a new document type with all the latest fields available in the model. To check them, you can either use the Taxonomy Manager in Studio, or run an example with Validation Station to see how fields are mapped in your document.

Let me know if anything is unclear or if you have further questions.

Hello @Antoine_Chaussin

Thank for your reply and I appreciate the same.
So basically you spoke about 2 different things here, first is VALIDATION STATION
& TAXONOMY MANAGER, my question is these both things are available in the
community edition and if yes then with these I can get the required results?
Yes, Antoine, I would definitely require your help because am just new to RPA and
still learning, and I would be glad and thankful to receive the same.

Thanks & Cheers.

Validation station and the taxonomy manager are both included in the UiPath.IntelligentOCR.Activities package. It’s available in the community edition but you need to install it from the Package Manager in Studio. Validation station will let you visualize the results from the activity, so it should help you understand exactly what fields the model can detect (the example I put on Github don’t take advantage of all the fields the models can detect)

Thanks for the reply @Antoine,

I have downloaded and install everything as instructed. Successfully Run the program,
first I run this “MultipleInvoicesStagingTables” successfully but the results
are not as desired as I told you in the first message. The second I run
“ValidationStationTest” also successfully and changed the fields as per my
invoice.


Now am trying to run the last program “ValidationStationTestWithCustomDom”
it is showing me the error “Extract semi-structured document: The server cannot or will not process the request due to an apparent client error”

I have given the API Key same as given in the other program.
**Kindly help me on this error, and If this program is successful then only I will get the **
result with the field I have given in the “Validation Station Test”.

Hi @syed1980

The custom dom example if for an experimental feature that hasn’t been released in the community version yet, so it’s expected that it fails. You don’t need to worry about it.

Regarding ValidationStation, the corrected results can be retrieved from the out argument of the Present Validation Station activity. You can use the Export Extraction Result activity to transform the output of PresentValidationStation to a friendlier format. To do so:

  • Set a variable in the Output argument of the Present Validation Station activity (called ValidatedExtractionResults). You can do so easily by entering ctrl+k in the corresponding text box and typing your variable name
  • Use the Export Extraction Result activity and set the variable you created before as an input

The Export Extraction Result activity will give you a Dataset object. The Dataset contains datatables you will contain your data.

Hey @Antoine_Chaussin How can i improve the skills like confidence for next documents(Assuming Same template) . That mean it should try to detect the fields … Is that possible ?

Hi @Vijay_RPA

Currently this is not possible as the models cannot be retrained on the fly. Stay tuned for announcements around the models though, it may become possible in a future release

It would be great if we can train the BOTS … i dont want to mention But other RPA application had that capability hope Uipath come with that

Hello, I am interested to use the ML extractor and have below questions:

  1. Does the Enterprise license allow larger file sizes and more #page to process compared the community license limitation of 2 pages?

  2. To doubly confirm the ML extractor is not beta or preview release which would limit me from using this within our development and production on-prem Orchestrator and/or attended/ un-attended robots?

  3. Can the ML extractor activity be utilized for both attended and unattended robots?

  4. The extracted fields for invoices via the ML extractor is limited, how will be know when new fields are added?

  5. Does the training scope activity based off ML extractor (understand this isn’t currently do-able) will it require validation station when it can be trained?

Thanks,

A post was split to a new topic: Build Taxonomy error

hello @rinks,

first of all, please try the Machine Learning Extractor instead of the Extract Semi-Structured Document activity - the ML Extractor from the UiPath.DocumentUnderstanding.ML.Activities is the supported activity.

You can see a sample workflow here: How to use the IntelligentOCR Package (try the latest EDIT1 workflow).

  1. yes, enterprise license lifts the 2-page and max size limitations.
  2. ML extractor is in limited GA and will become available in maximum 1 month. No restriction in usage, unless that, if you are using our public endpoint for invoices for example, obviously your robots will need to have internet access to perform the service calls.
  3. yes, no limitation whatsoever
  4. with the on-prem version coming up very soon, you will be able to define whatever fields you need, and train a custom model for your needs.
  5. ml extractor feedback from validation station will be available in the first updates after the custom on-prem training support, so soon to come, but not available right now.

Hope this helps,

Ioana