How to extract a table from pdf to excel

Have tried using data scraping or screen scraping method but both do not work. Is there anyway that I could copy a table from pdf to excel file?

Hi @christine.tzenghy,

Use Read PDF or Read PDF with OCR activities to read PDF.
Perform string manipulation to retain only required data.
Use Generate data table activity to convert from String to data table with valid column & new line separators.
Use write range activities to write into an excel.

Hi @shivagowdavarad

Thanks for the response. Do you have an example? How to perform string manipulation to retain only required data etc.?

The manipulation will be based on the input data, if you can share the sample.

@shivagowdavarad
Unfortunately I couldn’t due to confidential issue.
However, I found this pdf with table online, could you use this as an example please? Thanks a lot.Auftragsbestaetigung-Kramer-3pos.pdf (195.2 KB)

Hello @christine.tzenghy,

Please check is activity :-

2 Likes

Hi @raj.parsana

Thanks for the suggestion. May I ask how to use this activity? What should be written in the input PDF and tabular?

Hello @christine.tzenghy,

Check the description of custom activty,
For Tablular property its True or False value.

2 Likes

Thank you for the response @raj.parsana, but I would prefer method without extension packages. :slightly_smiling_face:

Hello @christine.tzenghy,

Try Document Understanding :-

2 Likes

@shivagowdavarad
Hi, Could you kindly help me on this please?
Thank you

Hi @christine.tzenghy,

Bad luck!
Tried converting it! but since the columns are not separated in a structured manner, some columns are getting merged.
Trying to add multiple separators, will let you know if that works.

Hi @shivagowdavarad,

I am also facing issue. Did get the solution?
I tired with Document Understanding, but some colums are merged and unable to get correct result.

Hi @snehal23,

It will be possible using regex, but for that there should be some patterns on which the rules can be devised for extraction.

2 Likes

Hello Christine,
In this video, I extract tables from PDF and write data in Excel:

0:25 Install PDF Activities
1:10 READ PDF text, Get PDF page count, Extract PDF
5:40 Read PDF with OCR
6:55 Join PDF and Manage PDF passwords
9:30 Extract Images From PDF and Export PDF as Image
12:00 Extract table from PDF use-cases 1 replace some spaces with | (one column has multiple words)
24:00 Run the robot to see the result
25:40 Extract Table from other PDF use-cases 2 delimiter is 2*spaces " " easy split
31:50 Extract Table from complex PDF use-cases 3 unstructured data the logic will be based on IsUpper and IsLower
40:25 Extract the price value from PDF

Thanks,
Cristian Negulescu

2 Likes

Hi, I have seen the video here and followed the similar steps. But facing issue with strOutput variable being reinitialized everytime inside the for each loop. Can you help me with this?

Make the variable Global. Put the scope of the main Sequence and in this way you can avoid this issue.

HI,
A very useful video But unfortunately all the scenarios cover where data is predictable. My problem is I have duplicate data in columns with single space everywhere. What is the best approach to handel this?

Data Table is some thing like:
Name Description Size
Zahid Rahim Zahid Rahim 19
First Last First Last 20

How can I distinguish columns? As Name: Zahid Rahim and Description is also: Zahid Rahim and the space between these two column is only one space

Regards,
Zahid Rahim

Hey @christine.tzenghy ,
Have you tried using CV extract Table
Below is the snapshot of it

Hope it helps you