Issues with Regex Pattern in Capturing Dynamic Text Fields

Hello UiPath community,

I hope this message finds you well. I’m currently working on a project that involves extracting information from dynamic text fields using regular expressions. However, I’m facing challenges in creating a single regex pattern that can accurately capture the values from these fields due to variations in the text structure.

Here’s a brief overview of the problem:

The text data has dynamic sections containing fields such as “Line1,” “Invoice,” “Status,” “FieldValue,” “Error,” and “Instructions.” The goal is to create a regex pattern that can comprehensively capture the values from these fields, considering variations in the structure of the text.

I’ve tried using the following regex pattern, but it’s not covering all cases:

regex

Line1(?P<Line1>.*?)\W+\D+Invoice(?P<Invoice>.*?)\W+Status(?P<Status>.*?) Invoice:Status:\W+\S+\W+Container\W*FieldValue(?P<FieldValue>.*?)\W+Error(?P<Error_1>.*?)\W+Instructions(?P<Instructions_1>.*)(?= Field )

Separately, I’ve been successful with a different approach, using the following pattern:

regex

FieldValue(?P<FieldValue>.*?)\W+Error(?P<Error_1>.*?)\W+Instructions(?P<Instructions_1>.*)(?= Field )

I’m seeking advice from the community on how to create a robust and flexible regex pattern that can handle the variations in the text structure and capture all relevant field values consistently.

If anyone has encountered a similar challenge or has insights into regex patterns for dynamic text extraction, I would greatly appreciate your input.

Thank you in advance for your assistance!

Best regards,
Divyanshu

the test string which is used for the regex pattern is -:

Date07/02/2023
MessageFromTarget
MessageToTARGETTP1 Error Message fromDate:Message From:Message To: Error Messages4
ItemsCollapse allDescending1 Container
FieldValue
Error
Instructions Field Value:Error:Instructions:
Line1If you have questions regarding these errors, please call Target Corp Operations Helpline at 612-304-3310
Invoice
Status Invoice:Status: 2 Container
FieldValueUMBER 000004403 GROUP
ErrorNUMBER 4403 TRANSACTION CONTROL NUMBER 44030001
Instructions Field Value:Error:Instructions:
Line1
InvoiceESSAGES
Status Invoice:Status: 3 Container
FieldValue
Error
Instructions Field Value:Error:Instructions:
Line1
InvoicePC NOT VALID. FIX B
StatusEFORE NEXT TRANSMISSION Invoice:Status: 4 Container
FieldValue
Error
Instructions Field Value:Error:Instructions:

it is not able to capture the 5th,6th & 7th line in the test string.

website used for testing - regex101.com

In such case we would/could do (Divide and Conquer)

  • splitting (e.g) Blocks starting with Begin/Line ending with Line/End
  • Block details Extraction done with one or more Regex Patterns

Hi @divyanshu52

Can you specify what is the output you need.

Regards

in this format i want the output from the test string which i shared above earlier on.

Hi @divyanshu52

Do you want to write the output in notepad?

Regards

no not in notepad we create a json file in UiPath.


The regex pattern - Line1(?P.?)\W+\D+Invoice(?P.?)\W+Status(?P.?) Invoice:Status:\W+\S+\W+Container\WFieldValue(?P.?)\W+Error(?P<Error_1>.?)\W+Instructions(?P<Instructions_1>.*)(?= Field )

as you can see that from the 5th to 7th line the field names FieldValue,Error and Instructions is not getting highlighted with the currently used regex pattern but below rest of the field names are getting highlighted and working perfectly fine.

Hi @divyanshu52

Do you wanted to extract entire data in a single regex ?

Regards

No only i want to capture the values of the following fields : Line1,Invoice,Status,FieldValue,Error,Instructions .

Hi @divyanshu52

Please use below regex expression

(Line1.*\d*)|((Invoice[^:\s].*)|(Invoice(?=\s)))|((Status[^:\s].*)|(Status(?=\s)))|((FieldValue[^:\s].*)|(FieldValue(?=\s)))|(Invoice:Status:.*)|((Error[^:\s].*)|(Error(?=\s+Instructions)))|(Instructions.*)

Regards

1 Like

HI @divyanshu52

Thank you

Happy Automation!!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.