Extract string from word file

pravin_bindage · March 17, 2023, 11:09am

I m using this synntax to extract First name - System.Text.RegularExpressions.Regex.Split(str,“\n”)(9)
I get index was outside bounds of array @supermanPunch @ppr

pravin_bindage · March 17, 2023, 11:10am

Patient Registration Form_Sample1.docx (4.8 MB)

pravin_bindage · March 17, 2023, 11:23am

@arjunshenoy

Shubham_Kinge · March 17, 2023, 11:24am

@pravin_bindage
because there is nothing like \n present in doc so there is only 1 index and you are trying to get 9th index.

pravin_bindage · March 17, 2023, 11:26am

Okay then how can i extract that strings. In write line it shows every string to new line so i think that there are various lines

ppr · March 17, 2023, 11:26am

take a first look and inspect what Word readText is returning:

we got:

\n -marked with # 
\r - marked with +
\a BellChar chr(07) - marked with %

as we do not see any # we do see the typical word pattern \r\a

pravin_bindage · March 17, 2023, 11:31am

pravin_bindage · March 17, 2023, 11:33am

I get this as write line output

ppr · March 17, 2023, 11:36am

Kindly refer to above shared info, which shows details that are not visually captured by panel outputs

check and do see that there is no \n Asci code 10

Also crosschecked by this:

supermanPunch · March 17, 2023, 11:45am

@pravin_bindage ,

As a first suggestion, could you replace the Word Application Scope with the WordDocument Read Text activity ?as there were unwanted characters when used with Word Application Scope.

Then using the output text, we should be able to get the First Name value by anchoring between Patient Registration Form and Male|Female like shown below :
Regex :

(?<=Patient Registration Form)[\S\s]*(?=Male|Female)

Expression :

Regex.Replace(Regex.Match(docText,"(?<=Patient Registration Form)[\S\s]*(?=Male|Female)").Value.Trim,"\r?\n"," ")

We also remove the carriage return or NewlIne Characters using Regex.Replace() with pattern \r?\n

Shubham_Kinge · March 17, 2023, 11:56am

i got the john as output you can check.
FormWord= System.Text.RegularExpressions.Regex.Split(Text," ")(2)
@pravin_bindage

system · March 20, 2023, 11:57am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Read A Word Doc with Lines Studio studio	13	1399	May 21, 2020
Regex to extract text in new line after a specific phrase Studio activities , regex , string , question	15	13092	March 15, 2020
Extract text from Text file Activities excel , activities , studio , question , text	20	1434	November 12, 2023
String Manipulation issue on payroll data Studio activities , question	10	1055	November 7, 2020
Regex advice Studio studio , regex , question , string-manipulation	6	979	October 12, 2022

Extract string from word file

Related topics