Extract Data from PDF structured


#1

I have a PDF file and I want to extra the data from it. I can use the read pdf.
But I need some technical help from you guys.
Who can I extract the data and keep it structred.

I hope you can help me.
Maersk Release.pdf (40.0 KB)


#2

Hi,

You can use ABBY Flexicapture to extract the structured data, which is to be integrated with UiPath.
Ref: https://www.uipath.com/technology-partners
https://www.abbyy.com/en-eu/flexicapture/
Abbyy Flexi capture

Or use String manipulation methods to extract data.

Thanks,
Vikas Reddy


#3

Ok, I don’t have the flexicapture in the toolkit so i have to do it different.
The problem I’m facing now for example is the following.
If I read the data in the equipment table, the data is not complete and there are strange newlines created
For example:
the line
CLHU3748106 20 DRY 8’6 2200.000 KGS 1949.100 KGS 439121
becomes
CLHU374810
6
20 DRY 8’
6
2200.000 KG
S
1949.100 KG
S
43912
1

Who can help me?


#4

Here’s a trick that I’ve used to get structured PDF tables (text, not image):

  1. Open PDF in Microsoft Word (wait for it to convert)
  2. Use F5 (Goto menu) to go to first/next table
  3. Select and copy the table
  4. Open a blank Excel document
  5. Paste the table into Excel
  6. Use ReadRange to get as a DataTable
  7. Repeat and merge tables as necessary