hello, I am a new to document understanding technology… is it possible to read a PDF letter using document understanding?. I need to recognize 3 types of letters ( new hire, resignation and retirement) which can be done at the ‘classify’ stage but which extractor can I use for a simple PDF letter? I have tried ‘form extractor’ (having some issues) and ‘machine learning extractor’ but none of this seem to work because I won’t be extracting for any of the predefined forms (invoice, receipt, etc)
Hi @m.soto ,
Are the Documents (new hire, resignation, retirement) have a fixed format always ? Or would it appear in different formats ?
If they do appear in different formats, you would then require to Label the document dataset and mark the fields to be extracted and then Create a Training Pipeline to Train the Document Understanding Model with your Labelled dataset and then deploy the Skill with the Trained version of the Model.
As it is a custom document type and its not available out of the box, we would have to perform the above steps.
If the Formats are always fixed, we could check with String/Regex manipulations.
Thanks for your feedback! The letters format is NOT fixed. I did a quick research and seems like this solution has to be done through AI Center, and I am currently working on a community edition. Is there any other existing solution to get data from PDF letter using DU ?
The Community Edition has some limit of using UiPath Services, would suggest you to upgrade to Pro Trial to get access to better services/features and also the Licenses (AI Units), however, even this option will only be available for 60 days.
If its an enterprise level automation would ask you to contact UiPath Sales Team and get the Licensing details for your Environment/Enterprise.
There are other methods/Services outside of UiPath but again it will be a cost added. However, I believe Training the UiPath DU Model with the Document set is the way to go for your case.
You could check the Pre-Trained Extractors and check whether any of the document types is close to your Document type set and perform Labelling with it just to make it easier.
One another method is using the new
Extract Document Data activity :