download

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
download [2011/02/28 16:27] erosdownload [2021/09/13 10:20] (current) – [Use the corpus directly (no download necessary)] eros
Line 1: Line 1:
-====== Download ======+===== Use the corpus directly (no download necessary) =====
  
-===== Corpora =====+  * The wacky corpora are available on our **official corpus repository** here: http://corpora.dipintra.it
  
-  * if you are interested in downloading and using the corpora described [[corpora|here]] (for free, we're not selling them) please [[people|contact us]].+Other free web interfaces:
  
-  * semantically and syntactically annotated Italian Wikipedia: +  * the Jožef Stefan Institute hosts a web interface where many of our corpora can be used directly for free: http://nl.ijs.si/noske/wacs.cgi/first_form 
-    * [[http://medialab.di.unipi.it/Project/QA/wikiCoNLL.bz2|CoNLL format]] +  the University of Lancaster hosts (among other corpora) ItWaC and a 50% sample of UkWaC (registration is required but the service is free): http://cqpweb.lancs.ac.uk/ 
-    [[http://medialab.di.unipi.it/Project/QA/wikiMT.bz2|MultiTag format]]+  * the Charles University in Prague also hosts DeWaC, FrWaC, ItWaC and UkWaC (here again registration is required but the service is free): http://korpus.cz/english/index.php
  
-===== Build your own corpora =====+===== Download =====
  
-  * [[http://bootcat.sslmit.unibo.it/|BootCaT toolkit]] -- bootstrap specialized corpora and terms from the web+**NB**: when you download the corpora, you need to use your own tools to consult them. If you don't know what this means, then you probably don't want to download them and should use an online tools instead (see the secion "Free Web Interfaces" above). 
 + 
 +  * if you are interested in **downloading** the corpora described [[corpora|here]] (for free, we're not selling them) please [[people|contact us]] 
 + 
 +  * the semantically and syntactically annotated Italian Wikipedia is available for direct download from here: 
 +    * [[http://medialab.di.unipi.it/Project/QA/wikiCoNLL.bz2|CoNLL format]] ([[http://medialab.di.unipi.it/wiki/Tanl_Tagsets|tagset]]) 
 +    * [[http://medialab.di.unipi.it/Project/QA/wikiMT.bz2|MultiTag format]]
  
-===== Other stuff =====+===== Lists =====
  
   * [[Seed words and tuples]]   * [[Seed words and tuples]]
Line 19: Line 25:
   * [[Frequency lists]]   * [[Frequency lists]]
   * [[Keyword lists: ukWaC vs. the BNC]]   * [[Keyword lists: ukWaC vs. the BNC]]
-  * [[http://dev.sslmit.unibo.it/wac/post_processing.php|Post processing tools]] 
  • download.1298906842.txt.gz
  • Last modified: 2011/02/28 16:27
  • by eros