I want to extract table from PDF to Excel .Anyone have idea about this?

How to get datatable from PDF in multiple pages

I would try my hand on reading in the pdf, organizing the data into datatable for each page, put those datatables into a dataset, then for each datatable in the dataset write to the sheet.

Dataset(0) sheet one
Dataset(1) sheet two

I didn’t get u @Zach_Dobson ,PDF contains 43 pages ,but I want to extract table from page 6 to 9 and write it in excel .how is it possible ??

Hi @Aruna1,
First use the range in Read PDF to only read your table data.

After that try printing your tables in a text file to determine the format of the table i.e. how is each element separated. Then use string manipulation to get each element and store it in a datatable. Later you can do write range to print them into the excel.

Some useful formulas based on your screen shot would be getting all the elements in a line using
YourVariable.Split(Environment.NewLine.ToCharArray,StringSplitOptions.RemoveEmptyEntries)
YourVariable.Split(" "c).Trim

Thanks

Read PDF activity doesn’t recognise the values in the table @kunalj

Hey @Aruna1 can you attach your pdf file ? So we can get a closer look.

now imI’m having this page only …but it’s upto 43 pages ,I want to extract only from 6 to 9 pages …@Edupraz

“I didn’t get u @Zach_Dobson ,PDF contains 43 pages ,but I want to extract table from page 6 to 9 and write it in excel .how is it possible ??”

This could be a little bit unconventional but a possible solution is that you can read in each page as a data table. You can put a data table into a data set, it’s like how arrays hold strings, but its data tables.

So you would have 43 data tables into your dataset.

In a loop define that you only want to cycle through 6 to 7.

For( x=5, x<7, x++)
<“Whatever action you want to take place”>

There’s possibly a better solution, but I look into how you can use datasets in your situation.

@Aruna1 Try with read pdf with ocr activity

It doesn’t work @indra

Hi @Aruna1

Use “read PDF” activity if file contains text or use “read PDF with OCR” with OCR Engine into it, if file contains image. Then declare the variable and give page range as input range for number of pages you wants to read. Then use write text activity and write your text in text file or you may use any other activity as required for saving the target data.

Hope this helps :slight_smile:

@Aruna1 is it possible for you to share that pdf ?