Need Reg-Ex expression to extract numbers

Q) I have only 1 question and will post them in points wise for better understanding.

  1. I have a datatable and one of the column named called “Details” and i will have data which might have single line or multiple lines for this column.
  2. I need to extract the data which is present after the Keywords like Preauthorization, Pre-authorization, Pre-authorization number.
  3. We need to extract the numbers present after any of those Keywords and the number patterns are likely to have two formats.
    a. Starts with PA and contains hyoen and alphanumeric values(PA-12L-Hj76 or PA12309 or PA-123HJY)
    b. Starts with Letter and followed by alphanumeric values (U123HJU, HK45TG)
  4. Between the Keyword and numbers there might be single space or multiple spaces or no spaces also. It might also have comma or hypen or semicolon or nothing
  5. Note: The Keyword might be present or not. If Present also numbers might be present or not. My goal is to retrieve only if numbers present after the Keyword and that too matches our number pattern only. If no keywords present just give me empty array.
  6. I will attach some Input values how it will be under “Details” Column in a datatable and expected output values which will be helpful

Input 1 :
Document location: 5288-#212994489#
Member Authorization location: B264794887, B2647948871
Pre-authorization number:
DCN (S) if unable to attach to case: 02025112624221C0H01
IRC/Ineligible code(s): 23E
CPT/Procedure/Description/Drug Name: ALL
MD Review/Determination location: B264794887, B2647948871
Provider: PUTNAM HOSPITAL CENTER
U# and Date: U426180 10/16/2025

Output 1
Should result empty array as there are numbers after Keyword.

Input 2:
Document location: 5288-#212994489#
Member Authorization location: B264794887, B2647948871
Pre-authorization number - B264794887
IRC/Ineligible code(s): 23E
DCN (S) if unable to attach to case: 02025112624221C0H01 CPT/Procedure/Description/Drug Name: ALL
MD Review/Determination location: B264794887, B2647948871
Provider: PUTNAM HOSPITAL CENTER
U# and Date: U426180 10/16/2025

Output 2: here the output value should be B264794887 because after keyword there is only one number and after that IRC/ word started which is not our matching pattern

Input 3: Document location: 5288-#212994489#
Member Authorization location: B264794887, B2647948871
Pre-authorizationB264794887,U1234LKJU
IRC/Ineligible code(s): 23E
DCN (S) if unable to attach to case: 02025112624221C0H01 CPT/Procedure/Description/Drug Name: ALL
MD Review/Determination location: B264794887, B2647948871
U# and Date: U426180 10/16/2025

Output 3
B264794887,U1234LKJU after the number word IRC started

Input 4:
Preauthorization: L098GHT, K1234HYUJ, 12JUKH67
Output 4
L098GHT, K1234HYUJ

Input 5:
Document location: 5288-#212994489#
Pre-authorization : B264794887, B2647948871
Member Authorization location: B2647949987, B2647948871
DCN (S) if unable to attach to case: 02025112624221C0H01
IRC/Ineligible code(s): 23E
CPT/Procedure/Description/Drug Name: ALL
MD Review/Determination location: B264794887, B287647948871
Provider: PUTNAM HOSPITAL CENTER
U# and Date: U426180 10/16/2025

Output 5:
B264794887, B2647948871 Note: Beause after the keyword after the number match the text is havimg Member word which is not our pattern match

Input 6:
Document location: 5288-#212994489#
Member Authorization location: B2647949987, B2647948871
DCN (S) if unable to attach to case: 02025112624221C0H01
IRC/Ineligible code(s): 23E
CPT/Procedure/Description/Drug Name: ALL
MD Review/Determination location: B264794887, B287647948871
Provider: PUTNAM HOSPITAL CENTER
U# and Date: U426180 10/16/2025

Output 6: Empty array necause no keyword match found

FINALLY I HAVE AN EXPRESSION

arrPreAuth = DtInput.AsEnumerable(). SelectMany(Function(r) System.Text.RegularExpressions.Regex.Matches( System.Text.RegularExpressions.Regex.Match( r.Field(Of String)(“Details”). Replace(Environment.NewLine, " "). Replace(vbCr, " "). Replace(vbLf, " "). Trim, “(?is)(?:Pre-authorization number|Preauthorization)(?:\s*[ :-]\s)?(.{0,200})” ).Groups(1).Value, “(?i)(?<=^|[ ,;])(PA[\w-]\d[\w-]|[A-Za-z][A-Za-z0-9]\d[A-Za-z0-9])(?=$|[ ,;])” ). Cast(Of System.Text.RegularExpressions.Match)(). Select(Function(m) m.Value.Trim()) ).Where(Function(s) Not String.IsNullOrWhiteSpace(s)). Distinct(). ToArray()

but it is failing at when all multiple lines ae combined to new line then it is extracting the numbers that matches the number pattern even if they are present somewhere after the Keyword until 200 characters like below it is extracting the number present after the
MD Review/Determination location: B264794887, B2647948871

I just want the bot to extract the numbers present after keyword and should stop extracting immediately when they found any word that doesn’t matches our number pattern in a stretch.
I am using Assign activity and store all these values.
In Save To I am using Array[String] and Value to Save I am using above expression

Hope you guys understood my question. Please let me know if you have any queries. TIA

1 Like

Hi
I think the issue happens because your regex keeps reading up to 200 chars after the keyword, even when unrelated text appears.

Maybe try something like this instead, it stops as soon as a non-matching word (like “IRC” or “MD Review”) appears:

Assign to your variable arrPreAuth:

DtInput.AsEnumerable().
SelectMany(Function(r)
System.Text.RegularExpressions.Regex.Matches(
System.Text.RegularExpressions.Regex.Match(
r.Field(Of String)(“Details”).Replace(vbCrLf, " ").Trim(),
“(?is)(?:Pre[- ]?authorization(?: number)?)(?:\s*[:-]?\s*)([A-Za-z0-9,;-\s]+)”
).Groups(1).Value,
“(?i)\b(PA[\w-]+|[A-Z][A-Z0-9]+)\b”
).Cast(Of System.Text.RegularExpressions.Match)().
Select(Function(m) m.Value.Trim())
).
TakeWhile(Function(x) Not System.Text.RegularExpressions.Regex.IsMatch(x, “^(IRC|MD Review|Provider|Member|CPT|Document)”, RegexOptions.IgnoreCase)).
Distinct().
ToArray()

Maybe this helps it only grabs numbers right after the keyword and stops when it hits unrelated text.

Hi,

How about the following sample?

strResult = System.Text.RegularExpressions.Regex.Match(strData,"(?i)(?<=Pre-?authorization\s*number|Pre-?authorization(?!\s*number)).*").Value

arrResult= System.Text.RegularExpressions.Regex.Matches(strResult,"\b[A-Za-z]\w+").Cast(Of System.Text.RegularExpressions.Match).Select(Function(m) m.Value).ToArray()

Sample
Sample20251022-1.zip (5.5 KB)

Regards,

try this - (?is)(?:Pre[- ]?authorization(?: number)?)(?:\s*[:-]?\s*)([A-Z0-9,-; ]+?)(?=\b(?:Member|IRC|DCN|Provider|MD)\b|$)

NOT WORKING. Throwing error saying Keyword not found. I have the keyword in my datatable under Details column

DID NOT WORKED
I have a datatable and had two rows
Under Details Column I have the values like

  1. Preauthorization: L098GHT, K1234HYUJ, 12JUKH67

Document location: 5288-#212994489#
Pre-authorization :
Member Authorization location: B2647949987, B2647948871
DCN (S) if unable to attach to case: 02025112624221C0H01
IRC/Ineligible code(s): 23E
CPT/Procedure/Description/Drug Name: ALL
MD Review/Determination location: B264794887, B287647948871
Provider: PUTNAM HOSPITAL CENTER
U# and Date: U426180 10/16/2025

The expected outcome should be with these two numbers L098GHT, K1234HYUJ but it returned Empty Array

System.Text.RegularExpressions.Regex.Match(row(“Details”).ToString, “(?i)(?<=Pre[-\s]?authorization\s*:\s*)([A-Z0-9]+(?:\s*,\s*[A-Z0-9]+)*)”).Value
Can you try these