Regex Issue- how to extract same column from multiple tables that has that column on a pdf document

I have a pdf file that I need to extract same information from, but from different tables.
Final_Summary_Of_Data_Changes_v24.2.pdf (715.1 KB)

In this pdf file, I want to:
For each table having the “HCPCS” column:
• I want to extract the content (they are codes) under that “HCPCS” column and write it to a csv file.
• Each table could be written to a different excel spreadsheet or if possible, each table can be written to different tab on the same excel spreadsheet.
• When the data are being written to the excel spreadsheet, they should be written to column J (if possible)
Examples of what I intend to achieve,

  1. I want to extract all the content of “HCPCS” column from the “Added HCPCS Codes” table to a separate spreadsheet or a tab on the same spreadsheet
  2. I want to extract all the content of “HCPCS” column from the “Deleted HCPCS Codes” table to a separate spreadsheet or a tab on the same spreadsheet
  3. I want to extract all the content of “HCPCS” column from the “Modified HCPCS Codes” table to a separate spreadsheet or a tab on the same spreadsheet

My current workflow is pulling all the content under the HCPCS column from ALL tables that have HCPCS column and it’s writing them as a single column in the spreadsheet.

column I want to extract from the different tables



table 3

Hi @yomi.oluwadara ,

Could you check the below workflow :
Extract_MultipleTables_Regex_Modified.zip (673.1 KB)

The Modification is done to the workflow which was share privately, which involves also capturing the Line present Previously of the Table, which would indicate Table Heading (Modified, Added, etc…) and add this as the Sheet name in the Excel sheet.

For Regex Learnings :

1 Like

Have you tried Table Extraction?

@supermanPunch Thank you, the modification looks great. I do have some more tables in that PDF document that has the “HCPCS” column that I need to pull. (see attached screenshot).

I have looked at the REGEX expressions to see how I can tweak them, so the extraction is done on all the tables containing the HCPCS column but was unsuccessful due to my level of regex knowledge.


No I have not tried that before.

You should, it’s designed to extract formatted data like what’s in your PDF. Specifically, you could use Computer Vision to recognize where each table is and extract it.

1 Like

@yomi.oluwadara ,

Could you point out the tables not extracted ? I believe the Added, Modified and Deleted are updated in the Excel sheet, Could you re-check and let us know if there are data which are not extracted ?

1 Like

@supermanPunch
Thank you for your reply.
I just re-check again and it seems the tables I was I concerned with are nested under “reason”. I will validate the data and very likely will mark this as the solution. Thanks!

1 Like

@supermanPunch It does work!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.