Extract only BOLD characters

Hello,

i need to extract only Bold characters in word document could you anyone can help out

image

Thanks in advance.

-Shriharsha H N

@Palaniyappan @Raghavendraprasad

2 Likes

Hi
Use a READ TEXT FILE activity where pass the file path of that text file as input and get the output with a variable of type string named str_input
—now use a MATCHES activity and pass the string variable as input and mention the expression as “[B\][a-zA-Z0-9._/ ]+[/B]”

And get the output with a variable of type System.Collections.Generic.Ienumerable(System.Text.RegularExpressions.Match)

Now use a FOR EACH activity and pass the above match output variable as input
And change the type argument as System.Text.RegularExpressions.Match
And inside the loop use a writeline activity and mention like this
item.ToString which will display all the bold words Alone

Cheers @Shriharsha_H_N

4 Likes

its not working @Palaniyappan

May I know what was the error you were facing
Cheers @Shriharsha_H_N

its coming each letters has separate line

1 Like

@Palaniyappan

How it will identify BOLD text here ? Where you are checking it ?

This won’t
Kindly check the updated Regex expression
@lakshman

1 Like

@Palaniyappan : New Regex also not working please find the screenshot

@Shriharsha_H_N

Step1: Save as Word document as .htm file format
Use a read text file activity to read .htm file path

Step2 : use below regex function to get required values

System.Text.RegularExpressions.Regex.Matches(.htmOutputvairable,"(?<=<b>).*(?=</b)")
in for each

Check and let me know still your are facing any issues

Thanks
Amar.

1 Like

Still issue exists
@amaresan

@Shriharsha_H_N

I can have your converted .htm files . then I will share the code for that.

@amaresan

i have uploaded sample doc file. in that i need only bold characters means correct answers

Sample.zip (14.3 KB)

Thanks in advance

@Shriharsha_H_N

Do you have bold letters in text file? Read text file might not preserve the format try writing a custom code. Not sure because I have never worked on this usecase before, replying here because I was tagged :slight_smile:

Regards

Hello @Shriharsha_H_N
You cannot do this using regex after extracting text using read text activity.
and difficult if we
Convert it into html or htm file,as complex word document will have complex Structure and to extract element would be a little difficult

Now you have two choice either you create a macro to get all the bold text or you can do it using a combination of hotkeys and clicks

I have done it using the second method check this workflow for better understanding
(IT’ll get all BOLD Text )

Bold_Letters.xaml (14.8 KB)

1 Like

@Raghavendraprasad : Thanks for the reply.
in the text file im unable to bold middle characters but in word file i can. based on that scope i need to extrct only Bold characters from doc file