Extract Structured data - Unwanted Linebreaks

Hello,

I’m trying to scrape a Web table, and I’ve gotten the result data table on excel as a txt file, and it has unexpected/unwanted linebreaks.

that creates new rows outside the table pattern, breaking all the data into information that I can’t use, I’ve tried to import it as CSV or text configuring it to ignore linebreaks between " ", but the structured data doesn’t output some of the column cells with " " or rather the web table doesn’t have them like that.

The only thing I’ve noticed by opening the table on Notepad++ is that the full rows are split correctly by CR LF and the broken rows get split with only an LF

Is there a way to replace these lone LFs to a char like “;” or " - " in metadata (as in inside uipath)? Those outputs are needed to create variables for other activities.

The image below shows the result in an example:

This is the Xml currently in the “Extract Structured Data Activity”:

@Rafael_Tan
find somestarter help below:
grafik

1 Like

Hi Rafael,

This works in Notepad++

image

I wasn’t sure how to get that dodgy data into UiPath as cutting from Notepad++ and pasting into a UiPath variable didn’t seem to work but it would be something like this.

image

It’s supposed to look for newlines that are not followed by a carriage return and replaces with group1 ($1) which is the original text plus a semi-colon.

You might have to play around in UiPath.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.