Regex for Address embedded amongst other addresses

Supplier :          Deliver To :         Invoice To :         
ABC House           DEF Manor            Department 1         
123 Street          456 Road             Building Name        
ATown               BCity                789 Terrace          
ARegion             EF3 4GH              Another Street       
AB1 2CD                                  BRegion              
                                         IJ5 6KL              

Trying to extract the Deliver Address from the following extracted text using the Document Understanding activity, using the Regex Based Extractor.
As you can see, the issue I am having is that the delivery address is sandwiched between the supplier and invoice addresses asx well.

Any experts in Regex got an idea on how I can extract the required data from this text sample?

Hi @LewisHenderson …Since you mentioned Document Understaing…Address can be easily extracted using Form Extractor or Intelligent Form extractor by selecting that area . I use regex based extractor for Set Values mostly.

I still can extract the values you are looking for, Just one question …i belive you have created 3 or 4 variables for this address right? Addresss Line 1 Line 2 City State and Zip ?

1 Like

Thanks for responding, due to licensing, we are prioritising using Regex over other methods at this time.

Currently in my extraction there is a single address variable, as I have selected from the taxonomy that the Address is (for most cases) is an Addresss. So it attempts to break a single variable in the extraction. But you are right, I could extract each line of the address separately and build the address back up

Hi @LewisHenderson … Here you go…you can use in the Regex Based extractor as shown below…Do not Click on Edit just add the expression when you open the Regex based extractor, it will crash you Studio since there is a bug…

Pattern: RegEx_LH.txt (139 Bytes)

Note: I have tried my best to make it Generic, But still I am not positive that this same regex will work on other invoices, because address lines have lot of variations, like Suite # Apt # etc etc…Plus any change in Supplier Address will also impact this…

Thanks @prasath17. This will seve as a good foundation and I have more invoices I can test from and generate a robust regex function

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.