Help with Regex expression so target result are pulled

I need help to identify and fix what I did wrong in my regex expression.

THE TASK I NEED TO COMPLETE.
Pull this table (pages 28-31) on the attached pdf and write it to am excel sheet

THE RESULT I’M GETTING

MY REGEX

NOTE:
I also want to extract these tables from the pdf as well and write each table as a tab to the excel spreadsheet

EXPECTED RESULT
Automation should pull the content of the table on pages 28-31 and write to an Excel file file.

Attached is my workflow, I need some help- cant see what I did wrong
extract_modified.zip (1.3 MB)

The approach I usually use when parsing data from PDF that require RegEx:
1/ Read text from PDF (like you do now)
2/ SAVE retrieved text to a text file
3/ Analyse the retrieved text
4/ Prepare and test regex in some online regex tool like https://regex101.com/
5/ Apply the regex to UiPath workflow

Cheers

1 Like

Hi,

Which pages do you want to extract? The above sample reads from not 29-31 but 12-16.
If 12-16, the following will work.

regex pattern

"(?m)^(?<HCPCS>\w+)\s+(?<EffDate>\d+-\d+-\d+)\s+(?<Description>.+?)\s\s+(?<StatusIndicator>\w+)\s+(?<APC>\w+)( +(?<Edit>\w+))?"


dt_forExtractedMatches = matchesFromExtractedText.Cast(Of System.Text.RegularExpressions.Match).Select(Function(m) dt_forExtractedMatches.Clone.LoadDataRow(dt_forExtractedMatches.Columns.Cast(of DataColumn).Select(Function(c) m.Groups(c.ColumnName).Value).ToArray,False)).CopyToDataTable

Sample
extract_modified.zip (1.3 MB)

Regards,

@Yoichi
Thanks you, I realized I passed in the wrong page number

1 Like