Extract only BOLD characters

Hello,

i need to extract only Bold characters in word document could you anyone can help out

image

Thanks in advance.

-Shriharsha H N

@Palaniyappan @Raghavendraprasad

2 Likes

Hi
Use a READ TEXT FILE activity where pass the file path of that text file as input and get the output with a variable of type string named str_input
—now use a MATCHES activity and pass the string variable as input and mention the expression as “[B\][a-zA-Z0-9._/ ]+[/B]”

And get the output with a variable of type System.Collections.Generic.Ienumerable(System.Text.RegularExpressions.Match)

Now use a FOR EACH activity and pass the above match output variable as input
And change the type argument as System.Text.RegularExpressions.Match
And inside the loop use a writeline activity and mention like this
item.ToString which will display all the bold words Alone

Cheers @Shriharsha_H_N

4 Likes

its not working @Palaniyappan

May I know what was the error you were facing
Cheers @Shriharsha_H_N

its coming each letters has separate line

1 Like

@Palaniyappan

How it will identify BOLD text here ? Where you are checking it ?

This won’t
Kindly check the updated Regex expression
@lakshman

1 Like

@Palaniyappan : New Regex also not working please find the screenshot

@Shriharsha_H_N

Step1: Save as Word document as .htm file format
Use a read text file activity to read .htm file path

Step2 : use below regex function to get required values

System.Text.RegularExpressions.Regex.Matches(.htmOutputvairable,“(?<=<b>).*(?=</b)”)
in for each

Check and let me know still your are facing any issues

Thanks
Amar.

1 Like

Still issue exists
@amaresan

@Shriharsha_H_N

I can have your converted .htm files . then I will share the code for that.

@amaresan

i have uploaded sample doc file. in that i need only bold characters means correct answers

Sample.zip (14.3 KB)

Thanks in advance

@Shriharsha_H_N

Do you have bold letters in text file? Read text file might not preserve the format try writing a custom code. Not sure because I have never worked on this usecase before, replying here because I was tagged :slight_smile:

Regards

Hello @Shriharsha_H_N
You cannot do this using regex after extracting text using read text activity.
and difficult if we
Convert it into html or htm file,as complex word document will have complex Structure and to extract element would be a little difficult

Now you have two choice either you create a macro to get all the bold text or you can do it using a combination of hotkeys and clicks

I have done it using the second method check this workflow for better understanding
(IT’ll get all BOLD Text )

Bold_Letters.xaml (14.8 KB)

2 Likes

@Raghavendraprasad : Thanks for the reply.
in the text file im unable to bold middle characters but in word file i can. based on that scope i need to extrct only Bold characters from doc file

Use VBA code:

Open VBA editor Alt+F11

Insert>- Module

Paste the following:
Sub CopyBoldText()
Dim doc As Document
Dim rng As Range
Dim boldText As String
Dim tempDoc As Document
Dim char As Range

Set doc = ActiveDocument
boldText = ""

' Loop through each story range (main text, headers, footers, etc.)
For Each rng In doc.StoryRanges
    Do
        ' Check each character in the range
        For Each char In rng.Characters
            If char.Font.Bold = True Then
                boldText = boldText & char.Text
            End If
        Next char
        ' Move to the next story range (headers, footers, endnotes, etc.)
        Set rng = rng.NextStoryRange
    Loop Until rng Is Nothing
Next

' Check if there is any bold text to copy
If Len(boldText) > 0 Then
    ' Copy bold text to clipboard
    Set tempDoc = Documents.Add
    tempDoc.Content.Text = boldText
    tempDoc.Content.Copy
    tempDoc.Close SaveChanges:=False

    MsgBox "Bold text copied to clipboard!"
Else
    MsgBox "No bold text found in the document."
End If

End Sub

Press F5 then run

Then paste the copied text in the word doc