This is an old revision of the document!
Frequency lists
Frequency lists of unigrams extracted from the three corpora. Lists of words and lemmas are provided, sorted by frequency. All the lists are in '.7z' compressed format.
deWaC (German)
- deWaC unigrams (lemmas)
- deWaC unigrams (words)
itWaC (Italian)
- itWaC unigrams (lemmas)
- itWaC unigrams (words)
ukWaC (English)
- ukWaC unigrams (lemmas)
- ukWaC unigrams (words)