Drag and drop one or more text files on to the drop zone, choose your options and get the word frequency.
Remove (punctuation) .,\/#!$%^ etc
Ignore the final characters for words over or characters long. Explanation
I wrote this word frequency counter to quickly find word frequencies for Latin texts, such Winnie ille pu (Winnie the Poo).
Latin is an inflected language, meaning that the ends of words change. So, for example, hasta (spear), declines as follows hasta, hasta, hastam, hastae, hastae, hasta.
This isn't always what you want in a frequency list, where hasta and hastam are considered as separate words in a straight comparison. I wanted a rough and ready way of grouping the same word together.
I'm interested in word frequency for language learning and reading in different languages, and really just want an approximate idea as to the most common 500 to 1000 words or so in a text.
The simplest way of doing this is to ignore the last few characters when comparing a word. It is not perfect but is good enough for this purpose, and will significantly reduce list size.
The frequency count will ignore the final x characters of any word over a length of y, when deciding whether or not two words should be considered the same.