I need to pick 5 most occurring words in a document, pdf or word, any suggestions how to go about this

I would probably solve this by first reading in the file into a string variable, then splitting the string on each space into an array. From there create a dictionary and loop through the array. If the key (which will be the current word in the array) doesn’t exist then add it to the dictionary and set the value to 1. If the key does exist then dont add it, just increment the value. Once you have this dictionary filled then you can loop through and find the keys with the highest count.

1 Like

To add onto @JosephNehl’s answer, I would recommend first sorting your array after you split the string. I would also recommend adding in an index counter. When searching through the array, you should start your word count search at the index counter rather than starting at the beginning each time. This will significantly speed up the process because that way you don’t iterate over words you’ve already counted previously.

Note that if you have a very large document it will still take awhile to complete the process.

1 Like