Split *.txt file into individual words

Dears,

It is possible to split text file into individual words , while ignoring all Special Characters, and put them into Data table ( CSV, XLSX…etc)Thanks

You mean like splitting text by space character?

What would be the desired structure of the datatable?

1 Like

Hello @dokumentor , YES, I need to Read Text File and put it into string than I need to clean text from special characters than put each word in a column of excel sheet :

Example :
-Saturn V rocket’s first stage carries 203,400 gallons (770,000 liters) of kerosene fuel and 318,000 gallons (1.2 million liters) of liquid oxygen needed for combustion, in a non-linear dissipation.
-At liftoff, the stage’s five F-1 rocket engines ignite and produce 7.5 million pounds of thrust.

@hsendel you can get words using regular expressions. Here is an example using matches activity and then putting words in a datatable column.

GET_WORDS_LIST.xaml (8.4 KB)

Hope it helps!

1 Like

Thanks @dokumentor , Just small tuning, how to skip words with hyphen in the midle like : non-linear, 2nd point, What {1,15} and \b stands for in RegEX : \b[a-zA-Z]{1,15}\b? Thanks
Note : Text updated above

Do you need to exclude words with hyphen or to remove hyphen?

\b is word boundary. You may use \s (space) but that way words that are at the end of a line are not matched.

1 Like

I want to exclude hyphen at the beginning of words, but I keep one in the middle of the word:
Example : I remove it in “- This is the First …” and I’ll keep it in " non-linear" as it’s one word.

Try with this expression:

\b(\w|-){1,15}\b

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.