Data Scraping linebreak

veveve · March 5, 2018, 2:30am

Hi,

I have an issue while I used data scraping to extract the data from datatable in pdf file and move them to word file.
It worked but there’s one issue that there is a line-break in word file.

for example,
In pdf file: This is Monday.
And I used data scraping “This is Monday” and place it into word file. And it’s gonna be like “This is
Monday”. I wonder how can I remove those line-breaks.
Has anyone had the same issue before?

Thank you for your time.

mwsupra · March 5, 2018, 3:00am

Hi,

I’m not sure why the particular file causes it to detect a line break, but have you tried some of the other methods of scraping the data like Read PDF Text or OCR? Do you get the same result with the line break?

adrian · March 5, 2018, 4:30pm

Data extraction depends on the internal structure of PDF document. I saw PDf documents where multiple rows in a table where actually one internal row even if they appear OK on the screen.

You can use UiExplorer tool to get an idea about the internals of your PDF doc.

Topic		Replies	Views
Bad data scraping when there is multiline in table cell Activities pdf , activities , data_scraping , question , string-manipulation	2	913	October 19, 2021
Extract Structured data - Unwanted Linebreaks Studio question	3	809	September 10, 2020
Extract table structure from PDF Help datatable , excel , pdf	4	3717	October 20, 2019
Issues in using Data Scraping Help	3	931	July 26, 2018
Data scraping on an inconsistent PDF Studio pdf , question	2	963	June 22, 2020

Data Scraping linebreak

Related topics