Extract DataTable with OCR

ubermench · March 12, 2018, 9:49pm

Hello, I am trying to extract datatable from pdf with Citrix Scrapping Relative with GOogle OCR, but till now it is impossible to me for extractracting data from the table in pdf, data is unorganized when i try to get it with Data Scrapping. If somebody know another way to do it, it would be helpful for me. Thank you so much.

ClaytonM · March 12, 2018, 10:09pm

Here is my suggestion. Only use OCR if the PDF is an image. If the PDF is text (where you can highlight each word), then use Read text from PDF or a similar method.

After you read the text you will need to use String manipulation to get it in the correct format with comma-delimitters and such. This will depend on the text and there’s no good way for me to answer this except learn lambda or LINQ expressions in vb.net so you can go through each line of the text and make edits.

If you have example pdf, then someone might be able to assist.

Once you have the format right for a CSV comma-delimitted file, just Write Text File with .csv extension, then Read CSV back into a datatable.

If the PDF is an image where you are required to use OCR, then this will take some playing around with the zoom and scale to get most accurate info. I’m not entirely sure I can help that much with that.

Hope this helps in some way…

Regards.

ClaytonM · March 12, 2018, 10:14pm

Additionally, you can try Extract Structured Data, but from my experience it’s more difficult to get working (depending on the PDF though), since it uses elements and you might need to loop through each page. Also, sometimes some of the columns are missed when extracting using this method.

ubermench · March 13, 2018, 1:46pm

Hi, this is the PDF file which I need to extract information to DataTable.

Can you give me some email for sending the file? I am new user so it is not allowed to upload files.

Topic		Replies	Views
Data extraction from pdf image to excel Help ocr , activities , question	6	2153	December 2, 2019
PDF table extraction in excel/datatable Studio studio , question , properties_panel	4	2051	June 9, 2021
Unstructred extract table data from pdf Studio datatable , pdf , activities , error	1	841	April 12, 2020
Create a datatable from unstructured PDF file Help datatable , pdf , studio	4	1525	October 31, 2019
Extract tabular data from Read-Only PDF Help	5	5931	April 26, 2017

Extract DataTable with OCR

Related topics