Working with Regular Expression to fetch PO Number

Hi Folks,

I am new to UiPath / Regular Expressions. If anybody can help, would highly appreciate.
Customers send their orders in pdf which I am converting to text and trying to extract the PO Numbers…all customers have their own way of sending the PO Numbers.

Below is my RegEx
(P.O. NUMBER|P/O Number|Purchase Order No.|P.O. NUMBER|PURCHASE ORDER NO.|PURCHASE ORDER NO|PO Number|Purchase Order|P.O.|PO\W)(?!.Box|BOX)(?!.Total|TOTAL)(\s?#?:?\s?)(.+)

1st Group – P.O. NUMBER|P/O Number etc. filters text starting with PO Number etc.
2nd & 3rd Group – (?!.Box|BOX)(?!.Total|TOTAL) - should not select PO Box or PO Total
4th Group - (\s
?#?:?\s
?) - Any special character after PO Number like PO Number # or PO Number :
5th Group - (.+) - Anything after the special characters, actual PO Numbers - 1234

Issue is - this Regular Expression is filtering few words - PROX / PADUCAH / PAOUCAH

These words are present in pdf files but not sure why are these getting selected.

Thanks in advance if somebody can help.

Regards,
Sushil

Here is how the lines look like

image

image

Can you provide the sample input for this?
Like from which statement in PDF what exactly you want to extract? That would help to debug your regex.

I am passing each line of pdf (one at a time) into the RegEx expression. So the input of RegEx is nothing but a text line which I showed in the screenshot above. In text format - below are the lines…

CITY NAME, STATE 40051 * PAOUCAH, STATE H2003 *
PREPAID/ALLOWED 2% 10 PROX NET 45 DAYS

you just need to extract H2003?

Ohh I was extracting text like PO Number: 123 or P/O No. #356.
All those PO Numbers were getting selected but some extra words were also getting selected like I mentioned - PROX / PAOUCAH etc.

Figured it out after spending whole day in it :frowning:
Thanks for the revert though
Still testing the modified RegEx
All it needed was a backslash "" wherever there was a dot “.” in Group 1
Below is the new RegEx
(P.O. NUMBER|P/O Number|Purchase Order No.|P.O. NUMBER|PURCHASE ORDER NO.|PURCHASE ORDER NO|PO Number|Purchase Order|P.O.|PO\W)(?!.Box|BOX)(?!.Total|TOTAL)(\s?#?:?\s?)(.+)

1 Like

Backslash doesnt get shown in the typed text here

1 Like

Can you help me? I also want to extract PO number.