Extract text from Text file

Hi all,
I have a scenario. I’m receiving a text file in that i will be having questions, question options and the answers. Now i need to extract only questions and answers and store them in another text file. How to achieve this. I’m looking this solution either in text file or word file. Anything is ok for me. Any suggestions.

Sample text file attached below for reference

Sample questions Multiple choice.txt (560 Bytes)

I need output like this.

  1. How did you find out about our product?

Answer : b

  1. What industry are you in?

Answer : d

Hi,

How about the following sample?

mc = System.Text.RegularExpressions.Regex.matches(strData,"(?<Q>\d+\..+)\n[\s\S]+?\n(?<A>Answer.+)")

Sample20231109-5L.zip (3.2 KB)

Regards,

Hi @Yoichi , i will check this and let you know.

Hi @Beginner1234

Can use this syntax

@Beginner1234

System.Text.RegularExpressions.Regex.matches(strData,“\d+.[A-Z \d a-z]+?|Answer\s:.”)

Use for each

In that use append line activity. To append the matches

Hi @Beginner1234

Try this

"(?=\d+\.\s+).*|(?=Answer).*"

O/P:

Hope this helps!!

Hi @Yoichi , tried this code for another set of questions but it’s not giving output below are the questions

Hi @lrtetala if possible can you share me the code.

Hi @Dinesh_Guptil getting below error

HI,

Can you try the following sample?

mc = System.Text.RegularExpressions.Regex.matches(strData,"(?<Q>\d+\..+)\n[\s\S]+?\n(?<A>.*Answer\s*:.+)")

Sample20231109-5L (2).zip (3.7 KB)

Regards,

1 Like

Thanks @Yoichi it worked. Small doubt can I use same regular expression for word document also. I’m getting both .txt files and word documents.

@Beginner1234

Use Matches instead of match

Hi,

How are you planning to extract text from Word document? (ReadText activity?)
There is difference regarding linebreak between text and docx. Can you share a sample file of docx?

Regards,

@Yoichi for word document we need to use word document activities right.

Sample Word Document.docx (12.5 KB)

This is the sample file

Hi,

Which ReadText activity do you use, System-File-WordDocument-ReadText or ReadText with WordApplicationScope?

If Former the above regex will work as it is.
If latter it’s necessary to modify regex pattern because of linebreak.

Regards,

Sure @Yoichi Thanks i will try this.

Hi,

FYI, the following pattern will work for text from text file and both word activities.

mc = System.Text.RegularExpressions.Regex.matches(strData,"(?<Q>\d+\..+)[\n\r]+[\s\S]+?[\n\r]+(?<A>.*Answer\s*:.+)")

Sample20231109-5L (3).zip (13.9 KB)

Regards,

@Yoichi this code is working fine with Text file and for word file it’s writing me as it is from input file

Hi,

Sorry but the above pattern is not very good.
Can you try the following?

mc = System.Text.RegularExpressions.Regex.matches(strData,"(?<Q>\d+\.[^\n\r]+)[\n\r]+[\s\S]+?[\n\r]+(?<A>[^\n\r]*Answer\s*:[^\n\r]+)")

And show Q and A separately.

Sample20231109-5L (4).zip (14.0 KB)

Regards,

1 Like

Thanks @Yoichi it’s working perfectly. Thanks for your help.