frequency_lists

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
frequency_lists [2009/02/02 16:12] erosfrequency_lists [2014/03/27 10:45] (current) – [Repubblica (Italian)] eros
Line 1: Line 1:
 ===== Frequency lists ===== ===== Frequency lists =====
  
-Frequency lists of unigrams extracted from the three corpora. Lists of words and lemmas are provided, sorted by frequency. All the lists are in [[http://www.7-zip.org/|'.7z']] compressed format.+Frequency lists extracted from the WaCky corpora. Lists of words and lemmas are provided, sorted by frequency. All the lists are in [[http://www.7-zip.org/|'.7z']] compressed format.
  
   * Unigram lists. These are the complete lists, i.e. we did not perform any post-processing on them.   * Unigram lists. These are the complete lists, i.e. we did not perform any post-processing on them.
Line 9: Line 9:
 ==== deWaC (German) ==== ==== deWaC (German) ====
  
-  * {{:frequency_lists/sorted.de.lemma.unigrams.7z|deWaC unigrams}} (lemmas) +  * {{:frequency_lists:sorted.de.lemma.unigrams.7z|deWaC unigrams}} (lemmas) 
-  * {{:frequency_lists/sorted.de.word.unigrams.7z|deWaC unigrams}} (words) +  * {{:frequency_lists:sorted.de.word.unigrams.7z|deWaC unigrams}} (words) 
-  * {{:frequency_lists/de.lemma.bigrams.7z|deWaC bigrams}} (lemmas) +  * {{:frequency_lists:de.lemma.bigrams.7z|deWaC bigrams}} (lemmas) 
-  * {{:frequency_lists/de.word.bigrams.7z|deWaC bigrams}} (words)+  * {{:frequency_lists:de.word.bigrams.7z|deWaC bigrams}} (words) 
 + 
 +==== frWaC (French) ==== 
 + 
 +  * {{:frequency_lists:sorted.fr.lemma.unigrams.7z|frWaC unigrams}} (lemmas) 
 +  * {{:frequency_lists:sorted.fr.word.unigrams.7z|frWaC unigrams}} (words)
  
 ==== itWaC (Italian) ==== ==== itWaC (Italian) ====
  
-  * {{:frequency_lists/sorted.it.lemma.unigrams.7z|itWaC unigrams}}  (lemmas) +  * {{:frequency_lists:sorted.it.lemma.unigrams.7z|itWaC unigrams}}  (lemmas) 
-  * {{:frequency_lists/sorted.it.word.unigrams.7z|itWaC unigrams}} (words) +  * {{:frequency_lists:sorted.it.word.unigrams.7z|itWaC unigrams}} (words) 
-  * {{:frequency_lists/it.lemma.bigrams.7z|itWaC bigrams}} (lemmas) +  * {{:frequency_lists:it.lemma.bigrams.7z|itWaC bigrams}} (lemmas) 
-  * {{:frequency_lists/it.word.bigrams.7z|itWaC bigrams}} (words)+  * {{:frequency_lists:it.word.bigrams.7z|itWaC bigrams}} (words)
  
 ==== ukWaC (English) ==== ==== ukWaC (English) ====
  
-  * {{:frequency_lists/sorted.uk.lemma.unigrams.7z|ukWaC unigrams}} (lemmas) +  * {{:frequency_lists:sorted.uk.lemma.unigrams.7z|ukWaC unigrams}} (lemmas) 
-  * {{:frequency_lists/sorted.uk.word.unigrams.7z|ukWaC unigrams}} (words) +  * {{:frequency_lists:sorted.uk.word.unigrams.7z|ukWaC unigrams}} (words) 
-  * {{:frequency_lists/uk.lemma.bigrams.7z|ukWaC bigrams}} (lemmas) +  * {{:frequency_lists:uk.lemma.bigrams.7z|ukWaC bigrams}} (lemmas) 
-  * {{:frequency_lists/uk.word.bigrams.7z|ukWaC bigrams}} (words)+  * {{:frequency_lists:uk.word.bigrams.7z|ukWaC bigrams}} (words) 
 + 
 +==== Repubblica (Italian) ==== 
 + 
 +   * {{:frequency_lists:it.repubblica.lemma.unigrams.7z|Repubblica unigrams}} (lemma) 
 +   * {{:frequency_lists:it.repubblica.word.unigrams.7z|Repubblica unigrams}} (words) 
  • frequency_lists.1233587539.txt.gz
  • Last modified: 2009/02/02 16:12
  • by eros