Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
frequency_lists [2009/02/02 16:11] – eros | frequency_lists [2014/03/27 10:45] (current) – [Repubblica (Italian)] eros | ||
---|---|---|---|
Line 1: | Line 1: | ||
===== Frequency lists ===== | ===== Frequency lists ===== | ||
- | Frequency lists of unigrams | + | Frequency lists extracted from the WaCky corpora. Lists of words and lemmas are provided, sorted by frequency. All the lists are in [[http:// |
* Unigram lists. These are the complete lists, i.e. we did not perform any post-processing on them. | * Unigram lists. These are the complete lists, i.e. we did not perform any post-processing on them. | ||
Line 9: | Line 9: | ||
==== deWaC (German) ==== | ==== deWaC (German) ==== | ||
- | * {{frequency_lists/sorted.de.lemma.unigrams.7z|deWaC unigrams}} (lemmas) | + | * {{:frequency_lists:sorted.de.lemma.unigrams.7z|deWaC unigrams}} (lemmas) |
- | * {{frequency_lists/sorted.de.word.unigrams.7z|deWaC unigrams}} (words) | + | * {{:frequency_lists:sorted.de.word.unigrams.7z|deWaC unigrams}} (words) |
- | * {{frequency_lists/de.lemma.bigrams.7z|deWaC bigrams}} (lemmas) | + | * {{:frequency_lists:de.lemma.bigrams.7z|deWaC bigrams}} (lemmas) |
- | * {{frequency_lists/de.word.bigrams.7z|deWaC bigrams}} (words) | + | * {{:frequency_lists:de.word.bigrams.7z|deWaC bigrams}} (words) |
+ | |||
+ | ==== frWaC (French) ==== | ||
+ | |||
+ | * {{: | ||
+ | * {{: | ||
==== itWaC (Italian) ==== | ==== itWaC (Italian) ==== | ||
- | * {{frequency_lists/sorted.it.lemma.unigrams.7z|itWaC unigrams}} | + | * {{:frequency_lists:sorted.it.lemma.unigrams.7z|itWaC unigrams}} |
- | * {{frequency_lists/sorted.it.word.unigrams.7z|itWaC unigrams}} (words) | + | * {{:frequency_lists:sorted.it.word.unigrams.7z|itWaC unigrams}} (words) |
- | * {{frequency_lists/it.lemma.bigrams.7z|itWaC bigrams}} (lemmas) | + | * {{:frequency_lists:it.lemma.bigrams.7z|itWaC bigrams}} (lemmas) |
- | * {{frequency_lists/it.word.bigrams.7z|itWaC bigrams}} (words) | + | * {{:frequency_lists:it.word.bigrams.7z|itWaC bigrams}} (words) |
==== ukWaC (English) ==== | ==== ukWaC (English) ==== | ||
- | * {{frequency_lists/sorted.uk.lemma.unigrams.7z|ukWaC unigrams}} (lemmas) | + | * {{:frequency_lists:sorted.uk.lemma.unigrams.7z|ukWaC unigrams}} (lemmas) |
- | * {{frequency_lists/sorted.uk.word.unigrams.7z|ukWaC unigrams}} (words) | + | * {{:frequency_lists:sorted.uk.word.unigrams.7z|ukWaC unigrams}} (words) |
- | * {{frequency_lists/uk.lemma.bigrams.7z|ukWaC bigrams}} (lemmas) | + | * {{:frequency_lists:uk.lemma.bigrams.7z|ukWaC bigrams}} (lemmas) |
- | * {{frequency_lists/uk.word.bigrams.7z|ukWaC bigrams}} (words) | + | * {{:frequency_lists:uk.word.bigrams.7z|ukWaC bigrams}} (words) |
+ | |||
+ | ==== Repubblica (Italian) ==== | ||
+ | |||
+ | * {{: | ||
+ | * {{: |