Single regex to match different formats

I have 3 different formats of data. I need a regex that matches all three formats. I have created three individual regexes to match each format of data.
Data Format 1 : 01/11/2023 05/03/2021 2 10000000000000000000001 10000000000000000000001 90000 ON/OFF 1 0 $10.22
VISIT NEW
Patient ID / 1000 Kh Wil DOB: 09/27/2023
Name:
Review Message
free tuof

Format 1 regex : (?<service_from>\d+/\d+/\d{4}) (?<service_to>\d+/\d+/\d{4}) \d* (?\d{24}).* (?<est_overpayment>.\d+.\d+).\s.\s.Patient ID . (?<patient_id>\d+) (?[\D\s,]+) DOB: (?\d+/\d+/\d{4})\sName:\sReview Message\s(?<review_message_part3>[\D\s].\n)

Data Format 2 : 05/07/2023 05/11/2023 1 10000000000000000000001 10000000000000000000000 90001 OFF/ON 1 0 $33.10
VISIT NEW
Patient ID / 10001 MIL JS Hn DOB: 5/30/2030
Name:

If you have any questions, please call 888-888-8888

DDDDDD-LetterRef# 1234-120
Page 5 of 163Notice of Preliminary Findings

Review ID: 100001

Service From Service Thru Line Process Modif Process Co Uni New Overpat
Date: Date: # Cla Number Ren Cla Code Code Description Paid Units Amount
Review Message
hkdshvsjf

Format 2 Regex : (?<service_from>\d+/\d+/\d{4}) (?<service_to>\d+/\d+/\d{4}) \d* (?\d{24}).* (?<est_overpayment>.\d+.\d+).\s.\s.\s.\s.\s.\s.\s+Review ID: (?<review_id>\d).\s+.\s.\s.Patient ID . (?<patient_id>\d+) (?[\D\s,]+) DOB: (?\d+/\d+/\d{4}).\s.\s.\s+(?<review_message_part3>[\D\s].\n)

Data Format 3 : 07/01/2013 05/02/2030 1 100022222222222222222222 10003333333333333333333 93333 OF/OU 1 0 $10.22
VISIT NEW

If you have any questions, please call 8333-4333-3333333

DAAAAA-LetterRef# 799999-100
Page 6 of 163Notice of Preliminary Findings

Review ID: 1011111

Service From Service Thru Line Process Modif Process Co Uni New Overpat
Date: Date: # Cla Number Ren Cla Code Code Description Paid Units Amount
Patient ID / 10000 AD Lenr DOB: 3/14/2030
Name:
Review Message
fsdjnkfnsjff

Format 3 regex : (?<service_from>\d+/\d+/\d{4}) (?<service_to>\d+/\d+/\d{4}) \d* (?\d{24}).* (?<est_overpayment>.\d+.\d+).\s.\sPatient ID . (?<patient_id>\d+) (?[\D\s,]+) DOB: (?\d+/\d+/\d{4})\s+Name:.\s.\s.\s.\s.\s.\s.\s+Review ID: (?<review_id>\d).\s+.\s.\s.\s(?<review_message_part3>[\D\s].*\n)

I would appreciate any help to find out the single regex that works for all three formats

Hi @teny ,

Could you Provide us the Regex Expressions in Preformatted Text by using </> button.

It would be more clearer and there wouldn’t be any conversion errors.

Also, Could you also let us know more characteristics of the values you want to extract, so far we can understand the below Characteristics :

1. Service from - Date format - xx/xx/xxxx
2. Service to - Date format - xx/xx/xxxx
3. Est Overpayment - Couldn’t identify the pattern properly
4. Patient ID - Between Patient ID and DOB keywords ?
5. Review ID - Couldn’t identify the pattern properly
6. Review Message - After Review Message Keyword ?

Let us know your updates in this.

Format 1 regex :
<(?<service_from>\d+/\d+/\d{4}) (?<service_to>\d+/\d+/\d{4}) \d* (?\d{24}).* (?<est_overpayment>.\d+.\d+). \s. \s. Patient ID . (?<patient_id>\d+) (?[\D\s,]+) DOB: (?\d+/\d+/\d{4})\sName:\sReview Message\s(?<review_message_part3>[\D\s]. \n)/>

Format 2 Regex :
<(?<service_from>\d+/\d+/\d{4}) (?<service_to>\d+/\d+/\d{4}) \d* (?\d{24}).* (?<est_overpayment>.\d+.\d+). \s. \s. \s. \s. \s. \s. \s+Review ID: (?<review_id>\d ). \s+. \s. \s. Patient ID . (?<patient_id>\d+) (?[\D\s,]+) DOB: (?\d+/\d+/\d{4}). \s. \s. \s+(?<review_message_part3>[\D\s]. \n)/>

Format 3 regex :
<(?<service_from>\d+/\d+/\d{4}) (?<service_to>\d+/\d+/\d{4}) \d* (?\d{24}).* (?<est_overpayment>.\d+.\d+). \s. \sPatient ID . (?<patient_id>\d+) (?[\D\s,]+) DOB: (?\d+/\d+/\d{4})\s+Name:. \s. \s. \s. \s. \s. \s. \s+Review ID: (?<review_id>\d ). \s+. \s. \s. \s(?<review_message_part3>[\D\s].*\n)/>

  1. service from date format - MM/dd/yyyy
  2. service to date format - MM/dd/yyyy
  3. est overpayment is a dollar amount which could have decimal as well
  4. Between patient Id and dob keywords is the name which could include the middle name. For eg. Nina Ken Joy or Nina Joy
  5. Review Id is not actually a required pattern to extract usually it is 7 digits
  6. There is no static keyword after the Review message. The review message could be multiple lines. There is an empty line after the last line of the review message. That marks the end of the review message.

@teny , The Preformatted Text option can be found in the Editor as shown in the image Below.

image

Format 1 regex :
(?<service_from>\d+/\d+/\d{4}) (?<service_to>\d+/\d+/\d{4}) \d* (?\d{24}).* (?<est_overpayment>.\d+.\d+). *\s.* \s. *Patient ID . (?<patient_id>\d+) (?[\D\s,]+) DOB: (?\d+/\d+/\d{4})\sName:\sReview Message\s(?<review_message_part3>[\D\s].* \n)

Format 2 Regex :
(?<service_from>\d+/\d+/\d{4}) (?<service_to>\d+/\d+/\d{4}) \d* (?\d{24}).* (?<est_overpayment>.\d+.\d+). *\s.* \s. *\s.* \s. *\s.* \s. *\s+Review ID: (?<review_id>\d* ). *\s+.* \s. *\s. *Patient ID . (?<patient_id>\d+) (?[\D\s,]+) DOB: (?\d+/\d+/\d{4}).* \s.* \s. *\s+(?<review_message_part3>[\D\s].* \n)

Format 3 regex :
(?<service_from>\d+/\d+/\d{4}) (?<service_to>\d+/\d+/\d{4}) \d* (?\d{24}).* (?<est_overpayment>.\d+.\d+). *\s.* \sPatient ID . (?<patient_id>\d+) (?[\D\s,]+) DOB: (?\d+/\d+/\d{4})\s+Name:. *\s.* \s. *\s.* \s. *\s.* \s. *\s+Review ID: (?<review_id>\d* ). *\s+.* \s. *\s.* \s(?<review_message_part3>[\D\s].*\n)