start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
start [2013/03/14 10:37] erosstart [2022/12/05 11:57] (current) eros
Line 1: Line 1:
-====== WaCky ======+====== WaCky - The Web-As-Corpus Kool Yinitiative ======
  
-Welcome to WaCky! ((the acronym stands for //**W**eb-**a**s-**C**orpus **k**ool **y**nitiative//))+Welcome to WaCky!
  
 We are a community of linguists and information technology specialists who got together to develop a set of tools (and interfaces to existing tools) that will allow linguists to crawl a section of the web, process the data, index and search them. We are a community of linguists and information technology specialists who got together to develop a set of tools (and interfaces to existing tools) that will allow linguists to crawl a section of the web, process the data, index and search them.
Line 7: Line 7:
 We try to keep everything very laid-back and flexible (minimal constraint on data representation, programming language, etc.) to make it easier for people with different backgrounds and goals to use our resources and/or contribute to the project. We try to keep everything very laid-back and flexible (minimal constraint on data representation, programming language, etc.) to make it easier for people with different backgrounds and goals to use our resources and/or contribute to the project.
  
-We built a few [[corpora]] you can [[download|download or use directly]], we +We built a few [[corpora]] you can [[download|download or use directly]], we described in great detail the procedure we followed to create our first corpora (DeWaC, UkWaC and ItWaC) in the paper:
-described in great detail the procedure we followed to create our first corpora (DeWaC, UkWaC and ItWaC) in the paper:+
  
 M. Baroni, S. Bernardini, A. Ferraresi and E. Zanchetta. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. //Language Resources and Evaluation// 43 (3): 209-226. ({{:papers:wacky_2008.pdf|PDF}}). M. Baroni, S. Bernardini, A. Ferraresi and E. Zanchetta. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. //Language Resources and Evaluation// 43 (3): 209-226. ({{:papers:wacky_2008.pdf|PDF}}).
  • start.1363253822.txt.gz
  • Last modified: 2013/03/14 10:37
  • by eros