Getting problem with regex matches

Given regex working fine while execute on https://regex101.com/.
But working in Uipath regex match it is not working.
So please give me solution.
regex = " /^ *((#\d+)|((box|bin)[-. /\]?\d+)|(.*p[ .]? ?(o|0)[-. /\]? *-?((box|bin)|b|(#|num)?\d+))|(p(ost)? *(o(ff(ice)?)?)? *((box|bin)|b)? *\d+)|(p *-?/?(o)? *-?box)|post office box|((box|bin)|b) *(number|num|#)? *\d+|(num|number|#) *\d+)/i"

Condistions are
“Box 123”,
“Box-122”,
“Box122”,
“HC73 P.O. Box 217”,
“P O Box125”,
“P. O. Box”,
“P.O 123”,
“P.O. Box 123”,
“P.O. Box”,
“P.O.B 123”,
“P.O.B. 123”,
“P.O.B.”,
“P0 Box”,
“PO 123”,
“PO Box N”,
“PO Box”,
“PO-Box”,
“POB 123”,
“POB”,
“POBOX123”,
“Po Box”,
“Post 123”,
“Post Box 123”,
“Post Office Box 123”,
“Post Office Box”,
“box #123”,
“box 122”,
“box 123”,
“number 123”,
“p box”,
“p-o box”,
“p-o-box”,
“p.o box”,
“p.o. box”,
“p.o.-box”,
“p.o.b. #123”,
“p.o.b.”,
“p/o box”,
“po #123”,
“po box 123”,
“po box”,
“po num123”,
“po-box”,
“pobox”,
“pobox123”,
"post office box

Check if this works:

" ^ *((#\d+)|((box|bin)[-. \]?\d+)|(.*p[ .]? ?(o|0)[-. \]? *-?((box|bin)|b|(#|num)?\d+))|(p(ost)? *(o(ff(ice)?)?)? *((box|bin)|b)? *\d+)|(p *-??(o)? *-?box)|post office box|((box|bin)|b) *(number|num|#)? *\d+|(num|number|#) *\d+)\i"

@pankajs3 What language is regex101 using? It doesn’t look like VB.NET is an option. I would recommend using something like .NET Regex Tester - Regex Storm for building and testing your regex as that is specific to VB.NET

When I put in your expression I get an error that there is an Unterminated set. Also, I noticed there is a space at the very beginning and I’m not sure if that is intended or not.

Could you please clarify what you’re trying to pull out with the regex? I have a feeling you could do something a bit simpler, but I’m not sure what you’re trying to match

@Dave
I have mentioned all things above which conditions i am trying to parse with regex

@rachrahul2
it’s not working

I don’t understand the conditions? I thought that was the text you were trying to read. So you’re saying that in any jumble of text, you want to pull out those exact words as written? If so, that’s going to need to be written in multiple regex instead of a single expression.

Can you provide an example of the text we would be searching through? Then give an example of what it is you want to pull out of that example text with your regex

@Dave
I required a regex find all the address of U.S. based
it can be one line or two line address

“number street|name|p.o. box city, state(NY), zip”
samples are:-

  1. 101 Hwy 71 North, DeQueen, AR 71102
  2. 4900 N. Wyatt Dr, ElDorado, AR 71021
  3. 800 OLIVE DRIVE, DAVIS, CA 95616
  4. 91 RT 163, PO BOX 335, MONTVILLE, CT 06353
  5. 1644 Market Drive
    Atlanta, GA 30316

Another format for total amount:-

  1. Grand Total $3,100.12
  2. INSURANCE PAY 4,092.00
  3. Net Total: 5,974.84
  4. Net Total: 1,359.83
  5. Net Total 6,414.88

@pankajs3 So that is the text you are searching through. Can you please share what your expected result should be from the regex?

i want to search whole string by one regex type-1 format give first above.
Note(regex for match US address)
“101 Hwy 71 North, DeQueen, AR 71102”
“4900 N. Wyatt Dr, ElDorado, AR 71021”
“800 OLIVE DRIVE, DAVIS, CA 95616”
“91 RT 163, PO BOX 335, MONTVILLE, CT 06353”
“1644 Market Drive
Atlanta, GA 30316”

then second regex for search whole string by regex type-2 format.
Note(regex match these given total amount calculation)
“Grand Total $3,100.12”
“INSURANCE PAY 4,092.00”
“Net Total: 5,974.84”
“Net Total 6,414.88”

So the normal format for regex-related questions is something like this:

Explanation of what you’re trying to accomplish: I want to pull out all numbers from the text

Example string: Here is an example of the string I’m looking through:

“91 RT 163, PO BOX 335, MONTVILLE, CT 06353”
“1644 Market Drive
Atlanta, GA 30316”
then second regex for search whole string by regex type-2 format.

Example output: Here is what I want the regex to find as a match (each match separated by comma):

91,163,335,06353,1644,30316,2

By asking your question in this format (or similar), it will greatly help us determine what you’re trying to achieve. I’ve tried asking a few different times and still have no idea what you’re trying to do. Keep in mind the format is not important, but the content is - make sure you include the things I wrote in bold - an explanation of what you want to do, an example of the text to search through, and an example of what you want the match to be. If you include those 3 things, then we can help you create a regex to achieve it

@Dave
No you are not getting my question i want to search whole “—data—” as i mentioned above.
Like i have 3 page pdf from that pdf i have to find out the address. That address format i given above.
Secondly in that pdf having i have to find some total calculation as “–data–” between commas that string i have to find out.

You are exactly right, I’m not getting your question. My response above was to help you phrase your question in a way that people would understand and be able to help.

Take a snippet out of the PDF that includes some information you want to match, and some information that doesn’t. This will be the “text to search through” that i mentioned above. Now, tell us what you want to pull out of the “text to search through” that you just pulled out from the PDF.

If you do that, then you will have provided enough information for us to help

This is my txt file data

D & E AUTO BODY REPAIR
402 WAKEMAN AVENUE / P.O. BOX 450
GRAFTON, ND 58237
OFFICE: (701) 352-3180 FAX: (701) 352-4998
FEDERAL TAX ID# 27-0759714
*** PRELIMINARY ESTIMATE ***
02/11/2019 10:46 AM
Owner
Owner: PERLA MENDOZA
Address: Work/Day: (701)360-1002
Inspection
Inspection Date: 02/11/2019 10:46 AM Inspection Type:
Appraiser Name: JASON A NELSON Appraiser License # :
Repairer

Now i have to find out US country address format

402 WAKEMAN AVENUE / P.O. BOX 450
GRAFTON, ND 58237

or

402 WAKEMAN AVENUE / P.O. BOX 450, GRAFTON, ND 58237

1 Like

any chance you can copy+paste the text instead of an image? This is much more useful in trying to figure out a regex solution by the way, so thank you for posting

@Dave I have to find out this data because this is a address matching format i have already provide you.
It can be in one line or two line where are “anynumber name city, State, zip”

I understand. In order to help i am going to try out lots of various expressions. It is a pain (and error-prone) for me to use your image to type it out myself. Instead, you could just edit your post and copy+paste the text…

Updated images with text

1 Like

alright this is going to be very complex for regex and honestly a different solution might be best in this case. However, I’m trying to work on a finding a solution. I’ll tell you my thoughts right now so you can try to find one as well.

Assumptions:

  1. A US Address will always end with , [2-letter state code] [5 digit zip code]
  2. A US Address will always begin on a new line with any of the following:
    a. A digit
    b. PO
    c. P.O
    d. Box
  3. A US Address will either be on 1 line or 2 lines
  4. Only the address will be found on the line, no other extraneous information will be included on that line

I’m going to use assumption 1 to get an anchor with a positive lookahead. I’ll then check to see if the beginning of that line starts with one of the items in assumption 2. If it does, pull out that whole line. If it does not, then pull out the previous whole line as well.

This is not a perfect solution as it doesn’t work if the line begins with PO or Box. It would miss the true first line in that case. It should work most of the time. In order to compensate, I would recommend utilizing the USPS API to validate each address after pulling it out with regex.

line always start with digit and end with [2-letter state code] [5 digit zip code]

PO or Box condition is optional and middle part of address so it doesn’t matter.

always look for
line always start with digit and end with [2-letter state code] [5 digit zip code]

That makes it much easier! I’ll edit this comment in a few minutes with a possible solution

@pankajs3 I believe this should work:

\d+.+(, [A-Z]{2} [0-9]{5})|\d+.+\r\n.*(, [A-Z]{2} [0-9]{5})

This first attempts to check the first line for a digit with 1 or more matches, then grabs everything between that digit and the , [state] [5 digit zip]. If it doesn’t find a match it then grabs the previous line doing the exact same thing

1 Like