Regex Input?

Hi - evaluating several different options to handle numbered lists within a native pdf document. Placement within the document varies, titles of the lists vary. They all seem to hold the same format after the list is found

1- Request A - each request can have multiple carriage returns/new lines.
2- Whereas some list items may have only one line
3- The number of list items vary
4- The list is separated from the rest of the document contents with a final series of 2 carriage returns leaving a blank line between the list and further document verbage.

I just started to delve into this but wonder if anyone has thoughts?

we recommend sharing cases as text samples with us. We can better work out solution strategies based on this

@Chris_Bolin Please share input and expected output so we can look at your issue.

Cant share the case specifically as it is confidential but I will mock something up

But the list follows the format in the example (down to the -) I have in the topic if that helps…just imagine large paragraphs prior to the list I show and large paragraphs after

Hi

Okay worked out the following statement which captures the numbered list (each numbered item may have more than one sentence.

\d-+(?:-\d+)[\s\S]?[a-zA-Z]+.\s*?[a-zA-z].+(?=\s.*(\r\n|$))

But since I need to include the last statement of the regex, I am unable to break the pattern at the next line which does not begin with a \d-

10- List possible solutions and approaches that may currently exist in the marketplace based on the
document.
11- Provide comments, suggestions, and/or any insights you may want the Client to consider.
THIS INFORMATION IS BEING REQUESTED FOR MARKET RESEARCH

Thoughts on how to break the match?

Chris

The line “THIS INFORMATION…” may not always be present