Word Document get output based on Text Format

i have a word file that has several sections. Each section starts with text in bold eg Verse 1.

i need to extract the text after the Section starts ( including what is in bold) as far as the start of the of the next section. Write this out to a new text file (formatting is not important), with the file name as the section (what is in bold verse 1)

then move on the next section and write that out to a new file with verse 2…

and so on

so for the example attached i would get 4 files, verses 1 - 4

long term i will be using a for each file in folder to check several files but for now i just need to get this working for 1 file

the number of sections will keep changing for each file, the number of lines in the sections will also change. the only way to see a new section is with a bold heading.

hope that makes sense - any help appreciated.

Wrod File.docx (16.1 KB)

Verse 1

I see trees of green
Red roses too
I see them bloom
For me and you
And I think to myself
What a wonderful world

Verse 2

I see skies of blue
And clouds of white
The bright blessed day
The dark sacred night
And I think to myself
What a wonderful world

Verse 3

The colors of the rainbow
So pretty in the sky
Are also on the faces
Of people going by
I see friends shaking hands
Saying, “How do you do?”
They’re really saying
I love you

Verse 4

I hear babies cry
I watch them grow
They’ll learn much more
Than I’ll ever know
And I think to myself
What a wonderful world
Yes, I think to myself
What a wonderful world
Ooh, yes
**

@adrian_sullivan

  1. read word text to read all text
  2. then use System.Text.RegularExpressions.Regex.Split(outputString,"(?=Verse \d+)")

this would give you array of strings which you can write to each file

cheers

it will never be verse unfortunately - more likely product codes that i will never know. i only used verse for demo purposes.

@adrian_sullivan

other than format like bold do we know anything like product code always starts with so and so or have specific number of characters or format

cheers

could be anything, only thing that is certain is that it will be bold

beyond that it could be anything, there might be 1 section or there could be 10 sections

i had chat gpt try it for me , but it kept failing to compile the code in a invoke code activity

@adrian_sullivan

no direct activity available can read the formatting to split with it..

also to split with bold characters by chance if somethign else is bold it might fail also

you might need to use custom code only if this is the only way

cheers