Regex extracting date in a complex data structure

Data where we extract and apply regex

"Filtered Data: −−DATE−−       ECOA   KOB     MEMBER−NO
06/12/20       B      QM      02004946        TEST DATA MORTGAGE                                               TRU
04/22/20       B      BC      2844550        COMPANY XCYS                                                        XPN
                      BC      03575459        CAPITAL ONE                                                       TRU
                      BB      484BB05812      CAPITALTWO                                                        EFX
04/16/20       B      FR      3996926         CRET                                                           XPN
                              181ZB14416      CREDOS                                                           EFX
                      FM      01207005         BANK  R                                                TRU"

I want to get the date including the company name . my current regex using multiline could already get the company name and here is my regex “(?<=\d{3,8}\s+)[A-Z].?(?=\s+[A-Z]{3}\s$)”

but the catch is I also wanted to extract the date , and it should check if the date is empty and the outpout should be based on the parent date

#Sample Output

04/22/20  CAPITAL ONE
04/22/20  CAPITALTWO
04/16/20  CRET
04/16/20 CREDOS
04/16/20  BANK  R

Hi @RajivKumar12

your data look very structured. Do you get it from a file? In that case it may be easier to use e.g. “Read CSV” activity.

No that data is already the output , I already have extract the company name I just wante to also include the date

I was talking about this data:

yeah that is the date where I am applying the regex. that is the data I am filtering. that is already the output i got from a file

Where is it coming from?..from a file?

It came from a pdf file , I extract the whole text before I apply filter

Sorry, I thought you could easily create a data table out of it but it seems that the number of spaces between the columns is not constant.

I think that didnt matter , I already have even extract the company name using the ff regex above , all I need to know is how could I also get the date to get the desired output above.

Guess this should help you for the date:
^(\d{2}\/\d{2}\/\d{2} | \s)

can we both bombine the date extract and my regex above ? so that I can get a single result for example 06/12/20 TEST DATA MORTGAG

It would be possible.
But I guess it is easier to combine these matches with string functions.

can you provide an sample asnwer ?


but what if date is empty like the example above ? how would I still achieved the output ? For example BANK R has empty date and its parent date is 04/16/20

04/22/20 CAPITAL ONE
04/16/20 CRET
04/16/20 CREDOS
04/16/20 BANK R

the regex I gave u is generating a whitespace if there is no match for the date in a new line.
you can fill it with the “parent” date in the loop where you put the matches together by adding an IF condition.

I have issues doing a loop between 2 regex matches and combining the results

what is the right way to loop from 2 matches like for example loop from match1 and match2 and then combine the result

There are a few options. One could be:


Instead of writing a log you can store this string in an array of strings or a list, etc.
Also approaches with DataTables are possible.

can you post that code of that here ?