Read pdf with OCR and pullout questions from text

Dear Community,

Do anybody has an advice of how I can output all questions in an prepared text file (pdf) to a string variable (and finally writing all questions in a excel file)?

Example:
For each file in folderpath, read text file (pdf with images) and output all questions to a string variable and then to exce file.

I am stocked in the “if condition” and “then activity”.

I thought about manipulating the string from a question mark “?” back to the next punctuation.

Like:
text.contains(“?”), then split(text, “?”) until next punctuation (“.”;“!”)
and write in variable “Question”.

→ Split((Split(VarText,“?”)(0).ToString),“.;!:”)(1).ToString ???

But not sure how I can write the conditions and actions right…

Any help is highly appreciated. Thank you!!

Here is my workflow:

Main.xaml (10.6 KB)

Hello @EdinZolj
it is possible can we see your sample PDF?

Hi @Pradeep_Shiv,

Please see below an extract. Thank you !!

testdoc.pdf (149.2 KB)

Hi, I tried to experiment with regex, but haven’t been successfully…

Regex.Split(VarText,“[?.;!:]”)(0).ToString

Any suggestions / help will be highly appreciate!

Thank you!