How to get the first 10 phrases that appear most in a string

Hi All,

Is it possible to get the highest occurring group of words in a string. For example, i have the string “My name is Someone, His name is Phil, Her name is Alex, he is a boy, she is a girl, her name is glory”

Looking at the string above, the words that appear most are “is” and “name” but the group of words that appear most are “name is” . How can i get this result? What i have now returns just a word and not group of words as shown in the attached screenshot

newList = ListOfWords.GroupBy(Function(w) w).OrderByDescending(Function(g) g.Count())

Thanks

Hi there @Olaoluwa, Please don’t repost your question multiple times on the forums. Thanks.

I don’t know the answer to this, but you can do some searches online if you are looking for some vb.net solution.

Either that, or just use some For each loops and If/Decision activities and find your word groups. Although, I haven’t thought about how this would be done and could be complicated.

Sorry for my lack of answers :wink:

Sure thanks. I won’t repost :slightly_smiling_face:

If you took an approach where you split the string up like by the comma, then ran the array through a For each, in order to look at each sentence. Then, find all possible word groups in each sentence and store into another array joined together by a character. Finally, you can find the count of each word group.

The word group array might look like this {“My name|name is|is Someone|My name is|name is Someone|My name is Someone”,“His name|name is|is Phil|His name is|name is Phil|His name is Phil”}

Then, take that array and use GroupBy on it to find one that shows up the most.

So you can probably do something similar to that.

Depending on how effectively you will be using this, have you tried standford NLP activities?

Demo : http://nlp.stanford.edu:8080/corenlp/process

image

2 Likes