How to use Regex based extractor activity

Hi All,

I am trying to extract the data from pdf using Regex based extractor but is is not giving any output .please let us know how to use it

I am using 2 extractor One is ML and Regex based extractor

Using Regex based extractor trying to extract Vendor Number from table

PDF Data Screenshot

@Palaniyappan @Lahiru.Fernando @Arpit_Kesharwani

Best Regards,
Naveen Chaganti

@loginerror @JosephNehl @tudor.serban
Do you have any idea about this …please let me know

Hi @Naveen.Ch

I have created something similar a couple of weeks back.
So this is the first rule when using regex extractor :slight_smile:

When you plan to use regex extractor, think of the entire document as a big long String that has all the data. - Why? - because we look for string patterns…

Now, keeping that in mind, look at the screenshot you attached on the Configure Regex expressions. For the “Table” section of it, it asks for a regex that defines the range of the table. Similar for the “Rows”.

What happens here?
Now, you have the entire document as a string. so. giving a regex, will look for that pattern in the string. But working with tables is little different.

So, first you need to mark from where to where in the string is the table. How you do that?
Using unique words that define the start and the end of the table.

Now… look at your table screenshot… Starting point is easy… its always the first header “Item No” in your case…
Now… go to the end of the table… Find any unique texts after ending the table? Add it and define that range…

Look at the screenshot below…

This regex, for table define the range by two keywords “Table I” and “Table II”.
This is the document…

Providing this range, will act as a SUBSTRING in our familiar terms… This will give you a separate set where you can look at for fields of the table.

Now, you can easily define the regex pattern for the table fields.
So TABLE portion gives you a chunk from the big string, from that chunk, find your pattern :slight_smile:

Document
Break |Document| TO | TABLE STRING |
Break |TABLE STRING| TO | ROWS OR FIELDS |

This is the idea :slight_smile:

You can also refer to this demo…
InvoiceAIDemo.zip (1.4 MB)

3 Likes

Hi @Lahiru.Fernando @Dave

Thank you for your response.

We need the regex syntax for below highlighted values in screenshot (i.e Item & Quantity columns )

Best Regards,
Naveen Chaganti

@loginerror @Ioana_Gligan @horatiu.palade
Hi Both,

UiPath is failed to extract the values from attached table. Please look into screenshot for your refence

In validation station, we can see the expected values are highlighted

But when we click on table to verify the values of quantity and vendor material number , the values are not capturing/displaying properly in table.

Additional info -
1.For Item No ,Regex syntax is working fine in regex editor in UiPath but values are not highlighting / identified in validation station.

2.To extract the data from table, uipath is not supporting to use multiple/parallel extractor, we have to use only one extractor to extract the value from table .

Best Regards,
Naveen Chaganti.