Is there something similiar like a “Regex Based Extractor” for “normal” string variables that doesn’t need all the other stuff around document understanding?
I have this scenario:
I get a big extracted text from an email. These emails always ask the same questions but the input from the user is free text. So I need to extract a lot of data from the email with regex extractor.
In the past I did this with a parallel activity and for each case I built a logic for the extraction using “Match” activities.
Now that I got more into document understanding I figured that creating a taxonomy and then use just the regex based extractor on it would work perfectly.
Unfort I do need all the other data for document understanding, like path to a file (i have no file just some plain text in a variable), Document object model, classification result …
Is there a way to just use it like I want?
If not how would you deal with that kind of task? It was kinda messy in the past with this huge parallel activity with a huge horizontal scrollbar. Also if there where new additions i had to extract it was also hell to add a new branch for the extraction.
Here is an example how a tax looks like with the regex based extractor (perfect for my use case but won’t work without all the other stuff):
I believe for your case, you would not need the Document Understanding capabilities as this could also be achieved with Regex based activities like Matches.
The Dependency on the Taxonomy structure could be revised maybe with a Dictionary/Datatable approach.
Could you let us know what is it that you need from the DU capabilities that will help you out the most ?
For complex data extraction in the past where DU is not suitable I did it like this:
Parallel Activity with 20+ branches. On each branch I have logic for the extraction and one Matches activity.
Now I want to look for a way where I can use one activity for all of the regex logic. Regex Based Extraction Activity does exactly that but only in a DU scope (which I don’t have here in my use case).
The form of the output is just 20+ variables with the extracted text, but the advantage that i have it in one activity instead of 20+ Matches activities (gets hard to maintain and scroll through horizontally).
As mentioned, have you thought of arranging the field Names and it’s regex expression in a Datatable and we could also keep the Output in another column, Something similar to what you have in the Regex Based Extractor or Data Extraction Scope.
Let us know if you are able to understand the approach above.