I have a bunch of PDFs with part lists as PDF tables. Please see the attached image.
I need the PDF table to be extracted to Excel as it is.
The text in the PDF is selectable. However, there are many problems with the PDF:
- Messy order of selection - The last line may be selected after the second line and so forth… Tried fixing the Read order in Adobe Reader settings without any success.
- Formats are not known for the columns except the first two which are surely numeric…I was successful in extracting values for the first 2 numeric columns and the 3rd assuming it starts with a number, using the “environment.newline” method.
- Also, there is a possibility of multi-line entries or no entries at all in each cell especially for the last column in the PDF table.
- OCR is a fail as there are special characters or symbols expected in most of the cells.
In order to select the column values in order, I tried the Alt+Mouse button down-Hover image-Mouse button Up, however without any luck.
Office 365 Excel has the Image to table feature which works pretty well, but I found out that’s only android based and not available for desktop.
Can any one guide me in cracking this problem? Thanks very much.