To format and remove unnecessary extra spacing

Hi,

I am trying to tidy the string variable to output without any extra spacing and be formatted properly into excel.

For example the code will read this text as into one string variable:

15x 4. Individual Items
Of Products - requirements
1x 2. Service Requests
5x 1. Categories of
Machines, Products etc…

So what is happening above is the extra spacing in particular after the first and second last line where it should all be in one line and I’m looking the output to show this way:
15x 4. Individual Items Of Products - requirements
1x 2. Service Requests
5x 1. Categories of Machines, Products etc…

I have attempted using -
string.join(" “,VariableExample.Split({Environment.NewLine,vbcrlf,vblf,” ",vbtab,vbcr,vbNewLine},StringSplitOptions.RemoveEmptyEntries))

Regex.Replace(VariableExample, “\t|\n|\r”, " ")

The attempted tasks concatenates all the strings together not separating into how I want the output to be. Any advice?

Hi @ciaramkm

There is string method Trim which removes extra space from both the ends of a string.

yourString.Trim

But it will not work in your case.
May I know the source of your Text. Are you extracting from PDF?

Hi, yes that option won’t work but yes you are correct the source of my text is extracted from PDF

@ciaramkm

Are you using Read PDF Text Activity.
If yes, then try to write the output in a text file using Write Text File Activity and if possible share it with us.
Analyse the text file to make proper regex to extract the required information in correct format
Ther is one option in Read PDF Text Activity which is Preserve Format. Try both ways by settubg it to True and then analyze the text and setting it to False also

Try one thing

Regex.Split(inputString, "\r\n(?=\d)")

Hi,

If your line always starts with number, the following might work.

System.Text.RegularExpressions.Regex.Replace(yourString,"\r?\n(?=\D)"," ")

Regards,

The split part is not allowing it to pass as the variable type is a string not string

@ciaramkm

Can you share the screenshot of your workflow or the pdf

Hi @ciaramkm ,

Use below expression .
image

Output:

Hi, thanks for the suggestion - I see it’s all concatenated into one line but actually looking for the lines to be separated in a new line like in this format

15x 4. Individual Items Of Products - requirements
1x 2. Service Requests
5x 1. Categories of Machines, Products etc…

Hi yes - included is a screenshot of an example where I read it as PDF activity and use a string variable. The box bullets are not included in the output as I replace these.