uiPath's ML Extractor

Hello, I am interested to use the ML extractor and have below questions:

  1. Does the Enterprise license allow larger file sizes and more #page to process compared the community license limitation of 2 pages?
  2. To doubly confirm the ML extractor is not beta or preview release which would limit me from using this within our development and production on-prem Orchestrator and/or attended/ un-attended robots?
  3. Can the ML extractor activity be utilized for both attended and unattended robots?
  4. The extracted fields for invoices via the ML extractor is limited, how will be know when new fields are added?
  5. Does the training scope activity based off ML extractor (understand this isn’t currently do-able) will it require validation station when it can be trained?

Thanks,

Hi,

  1. With an enterprise license you won’t have the same limitations (number of pages / size of document).
  2. ML Extractor is not in beta or preview. You’ll get the key in cloud platform at other services . Details : About licensing
  3. Yes, but only with attended you’ll have validation station popping up.
  4. Hm, not sure, I guess UiPath will announce it. I will double check that :slight_smile:
  5. TBA

Hope this helps.

Thanks for the reply. Follow up questions regarding the ML Extractor, are the extractor fields fixed or limited (invocei number, date, etc). I am guessing uipath is extracting the most common data point from an invoice and they control the scope? Would this be correct? OR does the user have option to expand the scope of what all can be extracted?

Indeed, for the out of the box model there are predefined fields that the model extracts. The users will be empowered to build custom models with their additional fields on top of this coming in a future release.

The out of the box fields for invoices and receipts are:
Invoice available field names:

  • “name”
  • “vendor-addr”
  • “billing-name”
  • “billing-addr”
  • “shipping-addr”
  • “invoice-no”
  • “payment-terms”
  • “due-date”
  • “po-no”
  • “date”
  • “net-amount”
  • “tax”
  • “total”
  • “currency”
  • “items”
    • “line-no”
    • “description”
    • “item-po-no”
    • “quantity”
    • “unit-price”
    • “line-amount”

Receipt model available field names:

  • “name”
  • “total”
  • “vendor-addr”
  • “date”
  • “phone”
  • “currency”
  • “expense-type”
  • “items”
    • “description”
    • “line-amount”
    • “unit-price”
    • “quantity”

Thank you once again. Another question. Obviously the ML invoice extractor expects a text containing the invoice content as string. Since i am dealing with invoices, which OCR engine would you recommend to use w/o having to pay for license? Is there a accuracy ranking by OCR which are free to use even for production environments. I need direction/guidance to make a decision.

It really depends on your invoices, their quality, how are they scanned, languages, handwriting, etc. My suggestion is to try with multiple OCRs and see which one performs better on your documents. You can start with Microsoft OCR that comes with UiPath platform and doesn’t require additional license.

Andra, as always thanks a bunch with your quick responses. Feel good to know there is someone listening to newbies like me who who is just starting with uipath.

I am now working on a solution to process invoices and need guidance/direction. I have a possible solution but would like some validation. According to you where in the forum can i share such ideas to brainstorm and get feedback (quickly :-))

I have read multiple forums, downloaded multiple packages regarding the possible solution …so from functionality(highlevel) and design have a basic idea.

Are you well versed with uipath and options available to process invoices? By Process I am referring to reading content of digital processes and extracting data from invoice and from website and then do some comparison to ensure entered data matches invoices.

Thanks,