Repetition of values in Data Extraction Scope

I am not getting why my values are getting repeated in the Data Extraction scope. it is shown in the Pre validation Station.

Can anyone help me on this?

Hi.@nashrahkhan …to be specific…only the mentioned values are getting repeated right? Not everything…

If you have multiple extractor in the data extraction scope , this field might have checked in two places.

@prasath17 everything is getting repeated twice.
I have also unchecked that.


but it is not working.
Moreover we can check multiple extractors. Because if one extractor confidence does not work then it will send it to another.
Isn’t it ?

@nashrahkhan …let me open up my setup and check quickly…I have used 3-4 extractors but never ever faced this problem. It’s weird.

@prasath17 Sure. That would be a great help.
Moreover can you tell me how we can use metadata, predictions and document extracted from train extraction scope extractor to train our data extraction scope model?

@nashrahkhan - Here is my Iintelligent form extractor(IFE) and Regex based extractor(RBE) setup. Even though two fields are in both extractors I have dont have anything mapped in IFE so this is fine…

If i am not wrong…only Machine Learning Extractor trainer is allowed inside the Train extractors scope because that is the only one trainable, others are not.

image

I am adding @AndyMenon @Lahiru.Fernando to assist some of your questions.

@prasath17 - there isn’t much choice here. If we try to drop in any other extractor except for the Machine Learning Extractor Trainer, Studio wouldn’t allow us. That said, we can always implement our own trainable extractors by implementing the classes as indicated in the documentation.

@nashrahkhan - yes that is how it is supposed to work.

This is how ML Extractor Trainer is supposed to work - Community is free to correct if I have not understood something right.

  • You put in the results of your Human validation (from Present Validation Station) at 1
  • Specify your skill or the end point at 2
  • Define the Output path for the Trainer to create the training results file at 3

Once you run your flow, validate your results - for example in this case you will manually correct the duplication of your signature fields and then the output from the PV Station will be used by the MLE Trainer. In the “Configure Extractor” step you will map the Signature fields to be the focus of training the extractor.

Once your flow is run, the MLE Trainer will create a set of files named documents, metadata & predictions at the output folder.

The entire folder will have to be zipped up and uploaded to Data Manager. If I’m correct this feature is still in Preview.

And here is the fun part, you export the data out of Data Manager - the data will be in a format acceptable to AI Fabric

You upload this data up into AI Fabric and re-run the Training pipelines.

And then you run your Flow again to see if your manually validated results have made any difference.