Regex Match Numbers, ignore space, comma ,decimal dot

I have the bot returning some OCR text, ocr does pretty good job returning text clearly, i have to do a string exist and check if a dollar amount exist in that string, string exist works 80% of the time but some time ocr changes the dollar amount format but missing a comma , adding a space , missing decimal dot etc , which obviously messes up any straight string exist check
Ex of number i am checking

dollarAmount= 20,650.131

ocr text example when returns wrong format
getText =
@“New Status:
HIM ROI Submitted - External Audit
Next Follow-up Date:
Comments: Update Correspondence
External Audit General
Field
Stage
Owning Area External Audit Request Technical Denial
Value
External Audit Info
Health Information Management Ex
External Audit Review Type
External Audit Amount
External Audit Repayment Total Am
External Audit Repayment Date Complex
20 650 131
Accept Cancel”

my if condition

getText.Contains(dollarAmount)

so best option seems to be a regex match , that check if the number exist but ignore space , comma and decimal dot, what is a good regex code i can use?

What kind of statement do i need to put in my if condition to say, does the

dollarAmount exist in the ocr return string exclude comma ,space, dots

@Charbel1

1 Like

maybe you start with a pattern like this: [\d\,\.\ ]+
grafik

1 Like

hmm so it would need to work if the dollar amount is in good format 26,444.255 or 26 444 255

this doesn’t seems to match


also how would i put this into a if statement , system.text.regularexpression something?

grafik
is also matching

maybe its better to check with regex.IsMatch (returns true or false) if a match was found or not

make sense, how would i chain the variable with regex in the isMatch?

just lets do a quick stop:

as far I understood a valid value is

12,123.507 = Double 12123.507
12 123.507 = Double 12123.507
12 123,507 = Double 1212507
12 123 507 = Double 1212507
12,123,507 = Double 1212507

maybe an adopted strategy will better serve

  • extract with regex the variations
  • remove spaces
  • check with Double/Int32 .TryParse method if the value is valid or not.

However you will have a risk, as it is running on OCR
a value 12,345.589 recognized by OCR to 12 345 589 will be valid but not the same.

So, just check your options on this area as well while doing requirement analysis

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.