I need help to identify and fix what I did wrong in my regex expression.
THE TASK I NEED TO COMPLETE.
Pull this table (pages 28-31) on the attached pdf and write it to am excel sheet
THE RESULT I’M GETTING
MY REGEX
NOTE:
I also want to extract these tables from the pdf as well and write each table as a tab to the excel spreadsheet
EXPECTED RESULT
Automation should pull the content of the table on pages 28-31 and write to an Excel file file.
Attached is my workflow, I need some help- cant see what I did wrong
extract_modified.zip (1.3 MB)
J0ska
December 22, 2024, 5:52pm
2
The approach I usually use when parsing data from PDF that require RegEx:
1/ Read text from PDF (like you do now)
2/ SAVE retrieved text to a text file
3/ Analyse the retrieved text
4/ Prepare and test regex in some online regex tool like https://regex101.com/
5/ Apply the regex to UiPath workflow
Cheers
1 Like
Yoichi
(Yoichi)
December 23, 2024, 12:42am
3
Hi,
Which pages do you want to extract? The above sample reads from not 29-31 but 12-16.
If 12-16, the following will work.
regex pattern
"(?m)^(?<HCPCS>\w+)\s+(?<EffDate>\d+-\d+-\d+)\s+(?<Description>.+?)\s\s+(?<StatusIndicator>\w+)\s+(?<APC>\w+)( +(?<Edit>\w+))?"
dt_forExtractedMatches = matchesFromExtractedText.Cast(Of System.Text.RegularExpressions.Match).Select(Function(m) dt_forExtractedMatches.Clone.LoadDataRow(dt_forExtractedMatches.Columns.Cast(of DataColumn).Select(Function(c) m.Groups(c.ColumnName).Value).ToArray,False)).CopyToDataTable
Sample
extract_modified.zip (1.3 MB)
Regards,
@Yoichi
Thanks you, I realized I passed in the wrong page number
1 Like