Hello People,
Hope you can help me with the problem, as i have tried anything i know and im just out of ideas how to solve it. Im scrapping a lot of PDF files which those have one table, and that one table is just an a nightmare. Generaly I have used data scraping for that table and it`s fine till the moment i face table which have multiline in single cell like on image below ( these black strips, are lines with text).
In that situation i could use just “read PDF text”, and do string manipulations, but nope, because if any one of the number in the table will be greater then >1k it will be provided in PDF table like 1 010,20 which means string spliting by space will split it to 1 and 010,20 and i don`t know any idea how to split it other way and know which number is what.
Not really sure what can i do in that situation.
TLDR: Cannot/Do now know how to do string manipulation on raw string from pdf.
Using data scraping for the table i need, which is messing up when it face table with multilines in cell.