First of all, congratulations for launching a great feature. I tried what you said and it worked in a jiffy. I am using taxonomy manager to define the fields that I want to use. However I am ignoring the FieldID generated for these fields in taxonomy manager. Instead, as you advised, I am now using taxonomy strings from the list of available field names. And it works (like magic).
My next step is to train the extractor further. I have added a Train Extractor Scope activity to the workflow. But not able to find any trainable activity that goes into it. I would like to improve on the parser by using train extractor to capture fields that are not available in the above list (Like GST Number)
Hi @Anupam_Mittal ,
Training is not possible for now, the models are available as is. Training the models on premises on arbitrary documents is a capability we are planning for Q1 2020. In the meantime please let us know what fields you need added, and we will do our best to add them.
Regarding the GST number this is a number with a very specific format, and that might be extracted much more effectively using a regex.
Cheers,
Alex.
thanks for your response. As of now, these are the additional fields that I can use:
In Items:
serial number
tax amount
tax/sac code
In invoice:
Amount in words (this can help validate total invoice amount)
Payment Terms
Vendor Name
Billing Name
Regarding GST, I agree that identifying GST is easy with regular expression. However, GST/VAT usually has two instances - one for the billing person and one for the vendor. If Machine Learning can distinguish between the two, that could be useful.