Regex to Find Line by Line Data

Hello All,

Below are the data that i want. I have got all the info , but its not providing if the data is continuous. Please check my sample code and input file, Its return the output but only issue is continuous lines.

Currently Document understanding is not available, So I need to try some other way.

  1. Need to split the data from 1ABCD123 till (before next 1ABCD123) – save as different set of data
    Successful Extraction: If the row is not one by one, I can see the out properly.

Failed : If the row is one by one then its extracting the first data and moving to next after some other data -

TEST1.xaml (20.4 KB)

Hi @Ice ,

Check the below Output :
RegexOutput.xlsx (9.0 KB)

Let us know if this is how you would want the Output to be represented, there were duplicate values after Extraction, hence removed the Duplicates.

If this is not the expected Output, then could you modify/provide us with the corrected Output data in the same format, so that we will be able to identify what should be the correction that needs to be done on your workflow.

1 Like

Hi @supermanPunch ,

That’s Correct ! This is the output required. I created that duplicate we still need that.

Were you able to find the solution ?

Thank you !

@Ice ,

In that case, you can check the below modified workflow :
TEST1.xaml (33.0 KB)

There were a few corrections regarding the input data being used with the Matches method, Also the named groups were not matching with the named groups that were included within the regex expression.

Test the workflow with the actual data, and let us know if you were able to get the output for all cases.

Hi @supermanPunch ,

There could be one more column presented “ABSUB.STU.CODE” after AMOUNT column. We need this data as well.

Thank you!

Hi @supermanPunch ,

I’m getting error while opening the file. I think because of the Version. I’m using V2022.4.6

Would you be able to send me a screen shot of the Flow.

Hi @supermanPunch ,

Thank you so much for your quick response. I’m getting the below error along with Invalid document.

@Ice ,

Could you list the Dependencies used and their versions ? Also let us know if you are using the Windows or Windows-Legacy compatibility type.

Hi @supermanPunch ,

I can see the below dependencies

@Ice ,

Could you check with the below workflow :
TEST1.xaml (32.4 KB)

Thank you @supermanPunch ,

Now i don’t get any error. Two column values(Other Num and Student Name) are missed in the output file i have attached the output that i got.
Also, available line items are 18 in the actual data txt file, but the output has 74 line items.

Hi @supermanPunch ,

I have changed the column name its working now. But, I think the regex is extracting all the data in the document each time. Instead of getting the data from 1ABCD123 and before next 1ABCD123 as a one set, Its returning all the data in each set.

Please let me know if you will be able to check the code.

Thank you so much.

@Ice ,

As the First few data is going to be the same for the whole text document, We can keep it only once. Hence, In the workflow After we have received the Table data, we break from the Outer For Each Loop thus avoiding further duplicates.

Check the workflow below and let us know if it does satisfy your requirement :
TEST1.xaml (32.9 KB)


Thank you so much for this solution. I’m sorry First few data might change for every document. I have given the same data for testing purpose.

Constant data will be 1ABCD123. Others might change based on each document.

@Ice , That was understood. We mentioned that it would remain the same for the table data present in that document.

Were you able to test it with some other sample data, were you able to get the output as required?

@supermanPunch , Thank you ! Yes, i tested the data with slight changes in the first few information.

Its working great for data “ABCTYPE, ABC CDE … etc…”. But, I’m getting the same data till column G for all line item. I tried to change the loop, but its returning the first captured info for all the lines.

What should be changed in the loop or regex to get the actual data for first 15 lines of text file ?

@Ice ,

Apologies for the Late reply.

Yes. A Split of Sections needed to be done to achieve this , Considering that each Section starts with 1ABCD123, we can split based on this and perform the capturing of values for each section only with the Regex configured.

Check the modified workflow below :
TEST1.xaml (60.5 KB)

I’m Sorry @supermanPunch , I’m getting the same error “Document is Invalid”.

@Ice ,

Apologies. Forgot about the compatibility used. The below workflow should work :
TEST1.xaml (60.5 KB)

1 Like

Hi @supermanPunch ,

This is working ! Thanks !

1 Like