The top 500 words
There is no definitive top 500 most common word list. There are only approximations
. See below for the reasons.
The top 500 words according to Surface Languages
There is no definitive top five hundred word list for any language. The composition of a list of words depends on whether the source of the list is spoken or written. Analysis of speech and text will produce different results, and so at the very least a list should specify its source.
When ordering a list, various editorial decisions need to be made. It must be decided (for example) whether singular and plural forms are considered to be the same word, or whether (in spanish) buon, buona, and buono are the same or different.
Word order. The exact order isn't important
The words in the lists are all important as they are extremely common :).
But the order will be affected by the editorial decisions made.
So whether a word appears as number ten or number fifty is not of huge importance. It is of much greater use to the language learner to know whether a word appears in the top two or three thousand spoken words.
Use the frequency lists as a guide
Anyone who is learning a language to a reasonable level will need to learn all words in these lists (and of course many more), and so the discussion of whether a word is number 100 or 210 is completely irrelevant.
Words and their associated meanings depend on context. A frequency list is useful as a starting point.
How the lists are constructed
The top five hundred most frequently used words on surfacelanguages words are loosely based on frequency lists taken from
and which are based on analysis of speech.
But to make the lists more general the following rules have been loosely followed in composing this list from the original :-
Nouns. Plurals have been removed. So for example, in the Italian frequency list, uomo (man) is included but uomini (men) is not. However, the number of instances of 'uomo' and 'uomini' would not have been added together.
Adjectives. The masculine singular only has been included in the list. For example, questo (this) has the following additional forms (questa, questi and queste) some of which appear in the original.
Verbs are sometimes given as infinitive and sometimes as first person singular where it seemed most useful and appropriate.
The original list treats groups of characters as distinct words so as an illustration 'he' and 'he visto' would not be considered as 'he' and 'he visto' but 'he', 'he' and 'visto'.
This is not how language works and this will have an effect on the word order.
Another example from the Spanish frequency list is 'por'. This appears at number fourteen in the original list but will have originally come from words such as 'por favor', or 'por qué' as well as 'por' meaning 'because of'.
Some words are removed
Some words, for example personal pronouns, have been removed where learning them on their own seems unhelpful. Other words, such as numbers, have also been removed.
The original corpus was taken from film subtitles resulting in a number of words which don't fit into a general top five hundred such as 'captain'. These have also been removed. As have swear words/brutte parole/malas palabras ...
It appears that there are a disproportionate number of commands and orders issued in films. These have sometimes been changed to the infinitive or removed.
The aim is to make the lists as general and as useful to language learners as possible.
Cutting to the chase
These words are all extremely frequently used in the spoken language, but the 'exact' sequence is open to interpretation, and this is but one.
I hope you enjoy them.