Data Scraping linebreak

datatable
pdf
word
datascraping

#1

Hi,

I have an issue while I used data scraping to extract the data from datatable in pdf file and move them to word file.
It worked but there’s one issue that there is a line-break in word file.

for example,
In pdf file: This is Monday.
And I used data scraping “This is Monday” and place it into word file. And it’s gonna be like “This is
Monday”. I wonder how can I remove those line-breaks.
Has anyone had the same issue before?

Thank you for your time.


#2

Hi,

I’m not sure why the particular file causes it to detect a line break, but have you tried some of the other methods of scraping the data like Read PDF Text or OCR? Do you get the same result with the line break?


#3

Data extraction depends on the internal structure of PDF document. I saw PDf documents where multiple rows in a table where actually one internal row even if they appear OK on the screen.

You can use UiExplorer tool to get an idea about the internals of your PDF doc.