I have been making a series of data scraping processes, which takes data from a word document and outputs it to excel. The files I have now been given are PDFs, and for consistency, I am trying to keep all the inputs as word docs.
So far, I have tried reading the PDF, and then using a word application scope and append text, but this removed the tables and formatting within the PDF. The tables are necessary for scraping the data. I have also tried this:
replace .pdf with .docx and open the files
Which converts the PDF to a word doc, but then an error message comes up when it opens, saying there is an issue with the data.
Does anybody have any suggestions on how to overcome this issue? Any help is appriciated