Can I use just taxonomy combined with regex based extractor? Alternatives?

ShadowDom · May 8, 2023, 6:20am

Hi guys,

TL;DR

Is there something similiar like a “Regex Based Extractor” for “normal” string variables that doesn’t need all the other stuff around document understanding?

I have this scenario:

I get a big extracted text from an email. These emails always ask the same questions but the input from the user is free text. So I need to extract a lot of data from the email with regex extractor.

In the past I did this with a parallel activity and for each case I built a logic for the extraction using “Match” activities.

Now that I got more into document understanding I figured that creating a taxonomy and then use just the regex based extractor on it would work perfectly.

Unfort I do need all the other data for document understanding, like path to a file (i have no file just some plain text in a variable), Document object model, classification result …

Is there a way to just use it like I want?

If not how would you deal with that kind of task? It was kinda messy in the past with this huge parallel activity with a huge horizontal scrollbar. Also if there where new additions i had to extract it was also hell to add a new branch for the extraction.

Here is an example how a tax looks like with the regex based extractor (perfect for my use case but won’t work without all the other stuff):

Thank you guys.

supermanPunch · May 8, 2023, 6:28am

Hi @ShadowDom ,

I believe for your case, you would not need the Document Understanding capabilities as this could also be achieved with Regex based activities like Matches.

The Dependency on the Taxonomy structure could be revised maybe with a Dictionary/Datatable approach.

Could you let us know what is it that you need from the DU capabilities that will help you out the most ?

In what form do you need the output ?

ShadowDom · May 8, 2023, 6:54am

Ok I will try to elaborate in more detail.

For complex data extraction in the past where DU is not suitable I did it like this:

Parallel Activity with 20+ branches. On each branch I have logic for the extraction and one Matches activity.

Now I want to look for a way where I can use one activity for all of the regex logic. Regex Based Extraction Activity does exactly that but only in a DU scope (which I don’t have here in my use case).

The form of the output is just 20+ variables with the extracted text, but the advantage that i have it in one activity instead of 20+ Matches activities (gets hard to maintain and scroll through horizontally).

Thx

supermanPunch · May 8, 2023, 7:03am

@ShadowDom ,

As mentioned, have you thought of arranging the field Names and it’s regex expression in a Datatable and we could also keep the Output in another column, Something similar to what you have in the Regex Based Extractor or Data Extraction Scope.

Let us know if you are able to understand the approach above.

ShadowDom · May 8, 2023, 7:34am

That is actually pretty simple and yet very smart.

I think I will use this approach.

Thx

Topic		Replies	Views
How to use Regex Based Extractor with an plain TXT file? Activities activities , question , document_understanding	4	573	April 13, 2023
How do we use regex based extractor to work on text extracted by form extractor in UiPath? Document Understanding studio	3	1382	December 26, 2020
Regex Extractor not extracting proper values AI Center question , ai_center	3	628	December 21, 2022
RegEx Based Extractor with Multiple Matches Document Understanding activities , regex , document_understanding , regex-extractor	4	2289	January 13, 2022
Matches Activity Works but Regex Based Extractor with Same Expression Not Working Document Understanding question , document_understanding , regex-extractor	5	546	August 29, 2023

Can I use just taxonomy combined with regex based extractor? Alternatives?

Related topics