Regex Tabular data issues

I am having issue extracting “line item”, how to get a number after a set of numbers, also some of the middle name is showing in text after the thru date , for example “MARGARITA”,“ALRAMIDRAMI”. how do i get that bundled in with name ?
I am getting the data from using pdf reading activity

For example Trying to get this from the first row

100040078787872410829041 - ICN
1 - line item
JANCE FANCY, YRAMIV - patient name
Medical request date-5/30/1921
Dob- 11/6/2015
Patient id- 715666525
Service from- 11/10/2011
Service theu- 10/20/2015

My regex so far

(?\d{24})\s+(?<line_item>\d)\s+(?[\D\s,]+)\s+(?<med_req_date>\d+/\d+/\d{4})\s+(?\d+/\d+/\d{4})\s+(?\d+)\s+(?\d+/\d+/\d{4})\s+(?\d+/\d+/\d{4})

ICN Line Item Patient Name Medical Record Request Patient DOB Patient ID Number Service From Date Service Thru Date
Date

100040078787872410829041 1 JANCE FANCY, YRAMIV 5/30/1921 11/6/2015 715666525 11/10/2011 10/20/2015

100040030201714347878787 1 BABYREZ, FANCY SRAMIY 3/30/1921 10/9/2012 123473187 09/20/2015 02/09/2020

100040078787875951521738 1 YAZIZAR, FANCY-ARAMITO 8/30/1921 11/6/2015 700044555 11/10/2011 10/20/2015

100040078787875399291634 1 CATIOLEVICH, FANCY 4/30/2019 11/6/2015 708852514 09/20/2015 02/09/2020

100040030201801778787873 1 ORMERO, FANCY 5/30/1921 11/6/2015 754889178 11/10/2011 10/20/2015

100040030201803787878793 1 JILILO, RAMI 1/30/2019 11/6/2015 512585701 09/20/2015 02/09/2020

100040037878787141655665 1 IIIIILLA RAMI, SARAMI 10/9/2012 9/16/2017 857968202 11/10/2011 10/20/2015


NMRCO-LetterRef# 555666-1
Page 1 of 5Records Requested

ICN Line Item Patient Name Medical Record Request Patient DOB Patient ID Number Service From Date Service Thru Date
Date

100040037878787888326154 1 BENXIQUEZ RAMI, KERAMI 7/30/1921 10/9/2012 458781960 11/10/2011 10/20/2015

100040078787876663515484 1 QUENCCHER, RAMIEY LRAMI 9/30/2019 10/9/2012 710425415 09/20/2015 02/09/2020

100040030201824368577775 1 JANCY   SARAMI, CRAMIAN 3/30/1921 10/9/2012 754879428 11/10/2011 10/20/2015

100040030201905228353164 1 YAMU RODRRAMI, 5/20/2018 11/6/2015 754879428 11/10/2011 10/20/2015
MARGARITA

100040037878787027330688 1 CAMU, DARAMIIS 4/30/2019 11/6/2015 754879428 09/20/2015 02/09/2020

NMRCO-LetterRef# 555666-1
Page 2 of 5Records Requested

ICN Line Item Patient Name Medical Record Request Patient DOB Patient ID Number Service From Date Service Thru Date
Date

100040030555666888014963 1 MORRES, RAMICRAMIS 2/30/2019 10/9/2012 754879428 11/10/2011 10/20/2015

100040030555666888885446 1 CASIOOO JR, IRAMIL 1/30/2019 10/9/2012 754879428 09/20/2015 02/09/2020

100040030555666222133857 1 JENAGANA GRAMIO, 6/30/2019 10/9/2012 754879428 11/10/2011 10/20/2015
ALRAMIDRAMI

100040035556662222225596 1 VARANYA VERAMIEZ, YARAMIRY 10/9/2012 10/4/2019 754879428 09/20/2015 02/09/2020

This data is dummy data and is not real data or is scrubbed

@Charbel1

Hi

I think we need more information to assist.

Can you please:

After reading this PDF into string can you please attach that file.

Then can you tell us the exact text you are looking to obtain.
Can you also tell us what is consistent before and after the required text.

Cheers

Updated question, data is changed due to phi so unable to add the pdf, above text is exactly how the pdf reader returns the string.

For example Trying to get this from the first row

100040078787872410829041 - ICN
1 - line item
JANCE FANCY, YRAMIV - patient name
Medical request date-5/30/1921
Dob- 11/6/2015
Patient id- 715666525
Service from- 11/10/2011
Service theu- 10/20/2015

Hey @Jay_Chacko!! Are you reading tables from .pdf files? If so, how are you doing it? Can you share?

Is every field in the table always populated?

Yes indeed

1 Like

Just using uipath pdf reader activity , above text is exactly how the activity the activity spits out the string, original pdf contain real data so unable to share .

Hi @Jay_Chacko

You have a couple options here.

See attached sample workflow.
Post 9 Feb 2022.zip (5.2 KB)

If the names of “MARGARITA” or “ALRAMIDRAMI” are consistent you could add them back into the name with a “concat” function after extracting but I wouldn’t recommend this.

IF it was my process and the names appeared on the next line, I would throw an exception and send an email for manual processing.

So at the start, check to see if ‘Group 9’ is null or whitespace (If yes throw exception).

Here is your Regex Pattern:
(\d{24})\s(\d)\s([^\d]+)\s(\d{1,2}/\d{1,2}/\d{4})\s(\d{1,2}/\d{1,2}/\d{4})\s(\d{9})\s(\d{1,2}/\d{1,2}/\d{4})\s(\d{1,2}/\d{1,2}/\d{4})

Result:

Here is the Groups:
image

Group 9 Example
image

Hopefully this helps.

1 Like

thanks this works great, simple and elegant.

Group 9 was giving me some issue , its most likely due to vb.net, its skipping the “MARGARITA” AND “ALRAMIDRAMI”, i use this site to verify .NET Regex Tester - Regex Storm, more accurate for uipath but have less tools compare to like regex 101

adding a \s before the next line seems to work

(\d{24})\s(\d)\s([^\d]+)\s(\d{1,2}\/\d{1,2}\/\d{4})\s(\d{1,2}\/\d{1,2}\/\d{4})\s(\d{9})\s(\d{1,2}\/\d{1,2}\/\d{4})\s(\d{1,2}\/\d{1,2}\/\d{4}\s)([\n\r].*)

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.