I have created something similar a couple of weeks back.
So this is the first rule when using regex extractor
When you plan to use regex extractor, think of the entire document as a big long String that has all the data. - Why? - because we look for string patterns…
Now, keeping that in mind, look at the screenshot you attached on the Configure Regex expressions. For the “Table” section of it, it asks for a regex that defines the range of the table. Similar for the “Rows”.
What happens here?
Now, you have the entire document as a string. so. giving a regex, will look for that pattern in the string. But working with tables is little different.
So, first you need to mark from where to where in the string is the table. How you do that?
Using unique words that define the start and the end of the table.
Now… look at your table screenshot… Starting point is easy… its always the first header “Item No” in your case…
Now… go to the end of the table… Find any unique texts after ending the table? Add it and define that range…
Look at the screenshot below…
This regex, for table define the range by two keywords “Table I” and “Table II”.
This is the document…
Providing this range, will act as a SUBSTRING in our familiar terms… This will give you a separate set where you can look at for fields of the table.
Now, you can easily define the regex pattern for the table fields.
So TABLE portion gives you a chunk from the big string, from that chunk, find your pattern
Break |Document| TO | TABLE STRING |
Break |TABLE STRING| TO | ROWS OR FIELDS |
This is the idea
You can also refer to this demo…
InvoiceAIDemo.zip (1.4 MB)