Scrap Data from a table in the given pdf

SANJAI_M · November 15, 2019, 10:15am

Can anybody tell, how to extract a specific table from the given pdf or a scanned PDF without actually opening document . With help of Read PDF with OCR Activity , I can scrap all the data from the pdf. But I need to scrap data from the dynamic table and some other data present in specific regions. I tried with “Read text from specific region”, but in that I don’t know what values to be given in height,width, X& Y in the input property window. Since, the document will be dynamic and should not be opened in a window, the OCR and screen scraping activity cannot be used. I’m trying to complete this project for more than a week. It would be grateful if someone help me through it.

joseph.yoon · November 15, 2019, 8:47pm

Hi @SANJAI_M,

Have you tried using the Read PDF Text package? Sometimes, the tables will be formatted in a logical manner as a String and then you can do string manipulation to extract the necessary values.

-Joseph

SANJAI_M · November 18, 2019, 7:02am

yes @joseph.yoon I tried using both the Read pdf text and read PDF with OCR activity.In the Read pdf text activity returns the output as the whole text present in the pdf. The Read PDF with OCR activity works the same in which only the Page numbers can be given as input in the Range property. But, I wanna get text from a specified region . Also, if the data contains lines for indicating the values as separate ones like in rows & columns. Then, I would have used Data scraping But it doesn’t have any lines in the table, simply there will be some spaces between them, Which makes it difficult for me complete the project.

joseph.yoon · November 20, 2019, 9:12pm

Would you have a sample pdf file?

SANJAI_M · November 21, 2019, 10:45am

Sry @joseph.yoon. Its an official file, that I couldn’t share

Topic		Replies	Views
PDF table extraction in excel/datatable Studio studio , question , properties_panel	4	1813	June 9, 2021
Extraction of table data from pdf Something Else feedback	8	513	July 17, 2023
Need to scrap the more number of data in pdf Help studio	14	911	September 25, 2019
How to Extract table data from pdf which contains image, tables, text are together? Help pdf , activities , data_scraping , question	0	959	November 25, 2019
Cannot scrap data from from PDF and store in excel need help Help studio	2	1412	September 7, 2019

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

Scrap Data from a table in the given pdf

Related Topics