Ignore and remove all the data that did not match the pattern

I have a regex that extract the data from a PDF and below is the result but there is dynamic data in between that I dont want like the Co-Applicant , No inquiry records found and Applicant USTIN CORDARYL Co-Applicant LONG INFILE REPORT : Page 3 of 7".

If data exists like that I wanna remove it and jsut return data that matches the pattern

Based on my current progress I already have extracted the data I need which is NOVUS HOME MORTGAGE , FACTUAL DATA and all the names.

But I mannually removed text that I dont need like Co-Applicant , No inquiry records found. etc using regex like

System.Text.RegularExpressions.Regex.Replace(text,"Applicant.*INFILE REPORT : Page \d+ of \d+.*","")

and

System.Text.RegularExpressions.Regex.Replace(text,"Co-Applicant\s*","")

But I dont wanna do it manually cause those data that will pop is dynamic , Is there a way in regex to remove all the data that did not match the pattern without mannualy deleting each text using regex like what I did on Co-Applicant text.

Help would be much appreciated. Thank you

Result data :

" 08/03/2020        NOVUS HOME                  Mortgage Company                                                     TRU
                     MORTGAGE
   07/08/2020        FACTUAL DATA                Mortgage Reporter                                                    XPN
   07/08/2020        FCTUALDATA                                                                                       EFX
   07/08/2020        NOVUS HOME                  Mortgage Company                                                     TRU
                     MORTGAGE
   07/07/2020        CROSSCOUNTRY                Mortgage Loan                                                        TRU
                     MORTGAG
   07/07/2020        FACTUAL DATA                Mortgage Reporter                                                    XPN
   07/07/2020        FCTUALDATA                                                                                       EFX
   05/21/2020        CAP ONE NA                  Bank Credit Card                                                     XPN
   05/21/2020        CAPITAL ONE                 Credit Card                                                          TRU
   05/21/2020        CAPITALONE                  Bank                                                                 EFX
   05/20/2020        CROSSCOUNTRY                Mortgage Loan                                                        TRU
                     MORTGAG
   05/20/2020        FACTUAL DATA                Mortgage Reporter                                                    XPN
   05/20/2020        FCTUALDATA                                                                                       EFX
   05/20/2020        FINGERHUT/WEBBANK           Finance Company                                                      XPN
   05/07/2020        EMS                                                                                              EFX
   05/07/2020        GROW FINANCIAL CREDI        Credit Bureau/Mortgage                                               TRU
                                                 Processing
   Co-Applicant
   No inquiry records found.

Applicant USTIN CORDARYL Co-Applicant  LONG                                               INFILE REPORT : Page 3 of 7"

Current progress :

Main.xaml (9.7 KB) project.json (1023 Bytes)

@Yoichi Hi Sir , maybe you have an idea regarding this one , would really appreciate , Thank you.

@Yoichi

Hello

From this string:

What did you want as the Output?
Can you tell us more information about the Pattern of the text?

08/03/2020 NOVUS HOME Mortgage Company TRU
MORTGAGE
07/08/2020 FACTUAL DATA Mortgage Reporter XPN
07/08/2020 FCTUALDATA EFX
07/08/2020 NOVUS HOME Mortgage Company TRU
MORTGAGE
07/07/2020 CROSSCOUNTRY Mortgage Loan TRU
MORTGAG
07/07/2020 FACTUAL DATA Mortgage Reporter XPN
07/07/2020 FCTUALDATA EFX
05/21/2020 CAP ONE NA Bank Credit Card XPN
05/21/2020 CAPITAL ONE Credit Card TRU
05/21/2020 CAPITALONE Bank EFX
05/20/2020 CROSSCOUNTRY Mortgage Loan TRU
MORTGAG
05/20/2020 FACTUAL DATA Mortgage Reporter XPN
05/20/2020 FCTUALDATA EFX
05/20/2020 FINGERHUT/WEBBANK Finance Company XPN
05/07/2020 EMS EFX
05/07/2020 GROW FINANCIAL CREDI Credit Bureau/Mortgage TRU
Processing
Co-Applicant
No inquiry records found.

Applicant USTIN CORDARYL Co-Applicant LONG

Hey @Jelrey

Check out my progress.

Please let me know if there is more you want removed.

Forum Post 18 Sep.xaml (9.9 KB)

I have got it down to this:

“08/03/2020 NOVUS HOME Mortgage Company TRU
MORTGAGE
07/08/2020 FACTUAL DATA Mortgage Reporter XPN
07/08/2020 FCTUALDATA EFX
07/08/2020 NOVUS HOME Mortgage Company TRU
MORTGAGE
07/07/2020 CROSSCOUNTRY Mortgage Loan TRU
MORTGAG
07/07/2020 FACTUAL DATA Mortgage Reporter XPN
07/07/2020 FCTUALDATA EFX
05/21/2020 CAP ONE NA Bank Credit Card XPN
05/21/2020 CAPITAL ONE Credit Card TRU
05/21/2020 CAPITALONE Bank EFX
05/20/2020 CROSSCOUNTRY Mortgage Loan TRU
MORTGAG
05/20/2020 FACTUAL DATA Mortgage Reporter XPN
05/20/2020 FCTUALDATA EFX
05/20/2020 FINGERHUT/WEBBANK Finance Company XPN
05/07/2020 EMS EFX
05/07/2020 GROW FINANCIAL CREDI Credit Bureau/Mortgage TRU”

Cheers

Steve

What what if this data are dynamic ?

Processing
Co-Applicant
No inquiry records found.

Applicant USTIN CORDARYL Co-Applicant LONG

like for example in the next one this would be

ProcessingTest
Test-Applicant
inquiry records found.

Applicant1 USTIN3 CORDARYL2 Co-Applicant LONG2

Based on your solution you mannually remove it based on string like (?=\sProcessing)[\s\S]+
what is the text is ProcessingTest ? it would not be a good idea if i create another (?=\s
ProcessingTest)[\s\S]+

Hello

As long as the dynamic text starts with the word “Processing” (I haven’t double checked but) it should still work :crossed_fingers:

Cheers

Steve

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.