Read pdf with OCR and pullout questions from text

EdinZolj · April 18, 2020, 12:33am

Dear Community,

Do anybody has an advice of how I can output all questions in an prepared text file (pdf) to a string variable (and finally writing all questions in a excel file)?

Example:
For each file in folderpath, read text file (pdf with images) and output all questions to a string variable and then to exce file.

I am stocked in the “if condition” and “then activity”.

I thought about manipulating the string from a question mark “?” back to the next punctuation.

Like:
text.contains(“?”), then split(text, “?”) until next punctuation (“.”;“!”)
and write in variable “Question”.

→ Split((Split(VarText,“?”)(0).ToString),“.;!:”)(1).ToString ???

But not sure how I can write the conditions and actions right…

Any help is highly appreciated. Thank you!!

Here is my workflow:

Main.xaml (10.6 KB)

Pradeep_Shiv · April 18, 2020, 4:08am

Hello @EdinZolj
it is possible can we see your sample PDF?

EdinZolj · April 18, 2020, 6:24pm

Hi @Pradeep_Shiv,

Please see below an extract. Thank you !!

testdoc.pdf (149.2 KB)

EdinZolj · April 21, 2020, 11:45pm

Hi, I tried to experiment with regex, but haven’t been successfully…

Regex.Split(VarText,“[?.;!:]”)(0).ToString

Any suggestions / help will be highly appreciate!

Thank you!

Topic		Replies	Views
EXtract Data from PDF and write to excel Activities pdf , activities , question	4	448	March 7, 2023
Need help in making pdf to excel Studio studio , question , activities_panel	2	578	July 9, 2021
Extract certain key words from multiple pdfs Activities pdf , activities , question	8	821	February 8, 2022
Reading certain data from multiple PDF files and transfer it to Excel Help excel , pdf , activities , question	3	789	January 20, 2020
Extract data from pdf document Help pdf , activities , question	18	1621	February 3, 2020

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

Read pdf with OCR and pullout questions from text

Related Topics