I am using regex grouping to extract data from the text, text i got from using pdf uipath activity . My regex skill are still under development , and i am having issues extracting some of the data
i am grouping because some times there are 10 patient data instead of 1
Below is the string (dummy data)
FCO-LetterRef# 489900-22PLEASE FAX TO 725-248-3602
CONFIDENTIAL
REVIEW DETAIL
2/25/2020 Provider Number/Name: 139635109/DALAS CHLLLLENS HOSPITAL Review ID: 66626
Review Type : LRG
Patient 625878043 TOPEZ, BIANNEY DOB:6/1/2009 SEX: Male Patient Account #:
ID/Name:
Service From Date: Service Thru Date: Est Overpayment
08/06/2016 08/11/2016 Claim Number: 200040040501923891740652 (Underpayment):$20,555.12
Original: Revised: Original: Revised: Original: Revised:
Data Need
Review ID = 66626
Review Type = LRG
Patient ID = 625878043
Patient Name = TOPEZ, BIANNEY
DOB = 6/1/2009
Service From = 08/06/2016
Service Thru = 08/11/2016
Claim Number = 200040040501923891740652
Est Overpayment = $20,555.12
I was only able to scrape till dob success fully if i got it right
let’s try to stay in sync for the discussion. Preprocessing with a block extraction is a good option to reduce such issues and sort it out before processing the details.
So would this approach be an option for you or not?
will that work if i get multiple sets of the data? , so the pdf can contain 1 set of data or 10 or more , hard to know where it ends , but each set will have the original text pattern like this
FCO-LetterRef# 489900-22PLEASE FAX TO 725-248-3602
CONFIDENTIAL
REVIEW DETAIL
2/25/2020 Provider Number/Name: 139635109/DALAS CHLLLLENS HOSPITAL Review ID: 66626
Review Type : LRG
Patient 625878043 TOPEZ, BIANNEY DOB:6/1/2009 SEX: Male Patient Account #:
ID/Name:
Service From Date: Service Thru Date: Est Overpayment
08/06/2016 08/11/2016 Claim Number: 200040040501923891740652 (Underpayment):$20,555.12
Original: Revised: Original: Revised: Original: Revised:
Seems like the VB regex is slightly different than most other type , its actually 99% same but in some cases it does fail, working regex for above is this