frequency_lists

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
frequency_lists [2009/02/02 16:12] erosfrequency_lists [2014/03/27 10:45] – [Repubblica (Italian)] eros
Line 1: Line 1:
 ===== Frequency lists ===== ===== Frequency lists =====
  
-Frequency lists of unigrams extracted from the three corpora. Lists of words and lemmas are provided, sorted by frequency. All the lists are in [[http://www.7-zip.org/|'.7z']] compressed format.+Frequency lists extracted from the WaCky corpora. Lists of words and lemmas are provided, sorted by frequency. All the lists are in [[http://www.7-zip.org/|'.7z']] compressed format.
  
   * Unigram lists. These are the complete lists, i.e. we did not perform any post-processing on them.   * Unigram lists. These are the complete lists, i.e. we did not perform any post-processing on them.
Line 9: Line 9:
 ==== deWaC (German) ==== ==== deWaC (German) ====
  
-  * {{:frequency_lists/sorted.de.lemma.unigrams.7z|deWaC unigrams}} (lemmas) +  * {{:frequency_lists:sorted.de.lemma.unigrams.7z|deWaC unigrams}} (lemmas) 
-  * {{:frequency_lists/sorted.de.word.unigrams.7z|deWaC unigrams}} (words) +  * {{:frequency_lists:sorted.de.word.unigrams.7z|deWaC unigrams}} (words) 
-  * {{:frequency_lists/de.lemma.bigrams.7z|deWaC bigrams}} (lemmas) +  * {{:frequency_lists:de.lemma.bigrams.7z|deWaC bigrams}} (lemmas) 
-  * {{:frequency_lists/de.word.bigrams.7z|deWaC bigrams}} (words)+  * {{:frequency_lists:de.word.bigrams.7z|deWaC bigrams}} (words) 
 + 
 +==== frWaC (French) ==== 
 + 
 +  * {{:frequency_lists:sorted.fr.lemma.unigrams.7z|frWaC unigrams}} (lemmas) 
 +  * {{:frequency_lists:sorted.fr.word.unigrams.7z|frWaC unigrams}} (words)
  
 ==== itWaC (Italian) ==== ==== itWaC (Italian) ====
  
-  * {{:frequency_lists/sorted.it.lemma.unigrams.7z|itWaC unigrams}}  (lemmas) +  * {{:frequency_lists:sorted.it.lemma.unigrams.7z|itWaC unigrams}}  (lemmas) 
-  * {{:frequency_lists/sorted.it.word.unigrams.7z|itWaC unigrams}} (words) +  * {{:frequency_lists:sorted.it.word.unigrams.7z|itWaC unigrams}} (words) 
-  * {{:frequency_lists/it.lemma.bigrams.7z|itWaC bigrams}} (lemmas) +  * {{:frequency_lists:it.lemma.bigrams.7z|itWaC bigrams}} (lemmas) 
-  * {{:frequency_lists/it.word.bigrams.7z|itWaC bigrams}} (words)+  * {{:frequency_lists:it.word.bigrams.7z|itWaC bigrams}} (words)
  
 ==== ukWaC (English) ==== ==== ukWaC (English) ====
  
-  * {{:frequency_lists/sorted.uk.lemma.unigrams.7z|ukWaC unigrams}} (lemmas) +  * {{:frequency_lists:sorted.uk.lemma.unigrams.7z|ukWaC unigrams}} (lemmas) 
-  * {{:frequency_lists/sorted.uk.word.unigrams.7z|ukWaC unigrams}} (words) +  * {{:frequency_lists:sorted.uk.word.unigrams.7z|ukWaC unigrams}} (words) 
-  * {{:frequency_lists/uk.lemma.bigrams.7z|ukWaC bigrams}} (lemmas) +  * {{:frequency_lists:uk.lemma.bigrams.7z|ukWaC bigrams}} (lemmas) 
-  * {{:frequency_lists/uk.word.bigrams.7z|ukWaC bigrams}} (words)+  * {{:frequency_lists:uk.word.bigrams.7z|ukWaC bigrams}} (words) 
 + 
 +==== Repubblica (Italian) ==== 
 + 
 +   * {{:frequency_lists:it.repubblica.word.unigrams.7z|Repubblica unigrams}} (words) 
 +   * {{:frequency_lists:it.repubblica.lemma.unigrams.7z|Repubblica unigrams}} (lemma) 
  • frequency_lists.txt
  • Last modified: 2014/03/27 10:45
  • by eros