Extract table of contents + Word document

I need to extract Table of contents from a word document. The word document is about 600 pages.

Can anyone help me with this. Is there a way to do this ?

Check out this thread. It might help you.


Hi @Sairam_RPA ,
can you share sample file?
where is table contents, normally it in top of file
we can read file then split

This is the format how it looks like. It can be in one of the first 50 pages.

The table always starts with “Table of contents” as heading.

[If there is a regex that searches for “Table of contents” and gets all the values after it which start with word and end with a number and has space or spaces in between - I think will work ] { Can someone help me with this regex ?} @Yoichi @ppr

Below is a sample format

Table of contents

Cover Page 1

Contents 3

Sites 4

Information 5

Description 6

Narrative 7

Cited 8

Resources 13

Equipment 17

Attachments 20

Animals 25


How about the following sample?

Sample20230913-2L.zip (15.5 KB)


1 Like

Hi @Yochi

This is the regex I used made a small change.

System.Text.RegularExpressions.Regex.Match(strData,“(?<=Table Of Contents)[\s\S]+?(?=\r\r)”).Value.Trim.ToLower

Works great now. Thanks a lot for your input. :smiley: :smiley:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.