Brainstorming Solutions for Editing Data in PDFs

Hi @ashwin.ashok,

If we assume your tables in the PDF have a standard pattern when the text is extracted, then there are two possible approaches (csv format is the savior in both):

Approach 1: Using only PDF activities

Suggested workflow: Main.xaml (12.4 KB)
Results first saved to temp.csv

Approach 2 - Open Pdf in word and extract the specific tables from word
Yes, you can open PDF files in word. Some pdfs wont work so well and will lose formating in word, but most structured ones will.

  1. Read PDF in word.exe
  2. Manipulate / convert the read text to a csv format (Hurdle! Multi level headers and multi values in single rows will lose formatting)
  3. Handling the formatting
  4. Write the resulted CSV text string to a temp.csv
  5. Read the CSV and

Major Part of this solution is from : How to read table in a Word document - #16 by Puransse thanks to @vvaidya

Workflow from the above link (slight mofications): wordTables.xaml (9.6 KB)
Results first saved to wordCsv.csv

Hope this helps!

1 Like