Bad data scraping when there is multiline in table cell

AndRewDev · October 19, 2021, 1:16pm

Hello People,

Hope you can help me with the problem, as i have tried anything i know and im just out of ideas how to solve it. Im scrapping a lot of PDF files which those have one table, and that one table is just an a nightmare. Generaly I have used data scraping for that table and it`s fine till the moment i face table which have multiline in single cell like on image below ( these black strips, are lines with text).

In that situation i could use just “read PDF text”, and do string manipulations, but nope, because if any one of the number in the table will be greater then >1k it will be provided in PDF table like 1 010,20 which means string spliting by space will split it to 1 and 010,20 and i don`t know any idea how to split it other way and know which number is what.

Not really sure what can i do in that situation.

TLDR: Cannot/Do now know how to do string manipulation on raw string from pdf.
Using data scraping for the table i need, which is messing up when it face table with multilines in cell.

Thomas_Mitchell · October 19, 2021, 1:25pm

Hey AndRewDev,

There is an option under the file Menu of Adobe to save the pdf as text. String manipulation would still be required but could give you a different result than the read pdf activity. Maybe try that option.

AndRewDev · October 19, 2021, 1:29pm

Thank you, hover i still don`t know how to extract four numbers from output like that: 1 496,38 23 344,17 1 840,55 to
var1: 1 496,38
var2: 23
var3: 344,17
var4: 1 840,55

Topic		Replies	Views
Data scraping of a table having single row Help	3	4258	July 7, 2017
Extract tabular data from PDF Help pdf , activities , data_scraping , question , data_manipulation	7	1623	December 14, 2019
Error in data scraping Activities pdf , data_scraping	2	1151	February 24, 2021
Cannot get tabular data in excel sheet Studio datatable , pdf , activities , data_scraping , question	5	966	March 7, 2020
Data Scraping linebreak Help datatable , pdf , data_scraping	2	2396	March 5, 2018

Bad data scraping when there is multiline in table cell

Related topics