Regex Grouping When there is multiple Pattern?

Jay_Chacko · April 7, 2022, 8:05pm

I have this text i am getting from PDF using uipath pdf extraction,

Problem
Usually i get pattern together meaning names will be together , Review Messages will be together. As you can see the text below is from a tabular view in pdf and the name “JASON DIANNA JOHN” is not together unlike “JAY, BAY” , also “Review Message(s):”, instead of text together like “Documentation does not support and billing error of inpatient could have been billed as outpatient” its broken in between other data i need, DOB 2 different location.

Data I need
Ex: from the first row of data
Patient Id: 521178201
Name: JASON DIANNA JOHN
DOB: 10/5/2002
Review message: Documentation does not support medical necessity
Service From Date:02/09/2017
Service thru Date:05/30/2015
Claim number:100050010111901715802222

This is something i started i am getting stuck at the name since name is broken up

Patient ID \/ Name:\s+(?<id>\d*) (?<name1>.*)\s+.*DOB:\s(?<dob>\d+\/\d+\/\d{4})

So how to get the broken up data , using name 1 , name 2,review msg1 and 2 then join? I do need this is group instead of individual.

Patient ID / Name: 521178201 JASON, Sex: Female Patient Account # :
DIANNA JOHN DOB: 10/5/2002
Service From Service Thru  Claim Number: Review Message(s): Documentation does not support
Date: 02/09/2017 Date: 100050010111901715802222 medical necessity
02/10/2020


Patient ID / Name: 310976610 JAY, BAY Sex: Female Patient Account # :
DUAA DOB: 7/2/2007
Service From Service Thru  Claim Number: Review Message(s): Documentation does not support
Date: 02/10/2013 Date: 10006004125561531161888 medical necessity
05/30/2015


Patient ID / Name: 666310555 Anie, Baby DOB: 4/15/2016 Sex: Male Patient Account # :
Service From Service Thru  Claim Number: Review Message(s): Documentation does not support
Date: 03/10/2010 Date: 100055530201666962948521 medical necessity
03/22/2014

Patient ID / Name: 222333136 Anu, Json DOB: 1/15/2012 Sex: Female Patient Account # :
Service From Service Thru  Claim Number: Review Message(s): Documentation does not support
Date: 05/04/2011 Date: 100020030201504215275522 and billing error of inpatient could have
11/22/2012 been billed as outpatient.

All data is Dummy data , not real data

Nithinkrishna · April 8, 2022, 6:38am

Hey @Jay_Chacko

Sorry, I’m not getting the requirement here.

The name is already split as I can see above, but what do you want to do with name is not understandable for me, my bad. Please explain.

Thanks
#nK

supermanPunch · April 8, 2022, 6:51am

Hi @Jay_Chacko,

Have you Checked the Extraction by Checking the PreserveFormat as True ?

ushu · April 8, 2022, 7:08am

@Jay_Chacko Which OCR are you using. Did you try with Tessract OCR

Jay_Chacko · April 8, 2022, 1:01pm

I just need to extract all data via regex grouping, nothing wrong with text returned from pdf read

Jay_Chacko · April 8, 2022, 1:02pm

I just need to extract all data via regex grouping, nothing wrong with text returned from pdf read or its settings

supermanPunch · April 8, 2022, 2:35pm

@Jay_Chacko ,

We wanted to know if the both the formats of Text Retrieval using Read Pdf Text Activity is Checked. With PreserveFormat set to True and PreserverFormat set to False.

If haven’t Checked yet, you could Check, so that there might be a Possibility of retrieving Text format in a Better way for Regex Extraction.

Jay_Chacko · April 8, 2022, 2:45pm

i see, never did that due to it introduce weird extra space and characters , but just tested seems like its broken up more, easier to visually see but harder for regex i am assuming

Patient ID / Name:       222333136 Anu, Json              DOB: 1/15/2012         Sex: Female            Patient Account # :
Service From             Service Thru       Claim Number:                               Review Message(s): The documentation does not support
Date: 05/04/2011         Date:             100020030201504215275522                     medical necessity and billing error of inpatient could have
                         11/22/2012                                                     been billed as outpatient.

supermanPunch · April 8, 2022, 3:14pm

Jay_Chacko:

Patient ID / Name: 310976610 JAY, BAY Sex: Female Patient Account # :
DUAA DOB: 7/2/2007
Service From Service Thru  Claim Number: Review Message(s): Documentation does not support
Date: 02/10/2013 Date: 10006004125561531161888 medical necessity
05/30/2015


Patient ID / Name: 666310555 Anie, Baby DOB: 4/15/2016 Sex: Male Patient Account # :
Service From Service Thru  Claim Number: Review Message(s): Documentation does not support
Date: 03/10/2010 Date: 100055530201666962948521 medical necessity
03/22/2014

@Jay_Chacko , Do we See different formats, like the above or are all Data in the same format ?

Topic		Replies	Views
Regex issue, text group extract Activities pdf , activities , regex , question , regex-extractor	6	954	February 2, 2022
Regex Tabular data issues Activities pdf , activities , regex , question	9	873	February 12, 2022
RegEx tool confusion Studio studio , question , tools	9	320	December 10, 2023
Extracting Tables data from string Activities pdf , regex , question	4	1035	October 8, 2021
Getting Text till the Next "Empty Line" ? Regex Studio regex , question , regex-extractor	3	965	March 28, 2022

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

Regex Grouping When there is multiple Pattern?

Related Topics