User Tools

Site Tools



  • M. Baroni, S. Bernardini, A. Ferraresi and E. Zanchetta. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation 43(3): 209-226 (PDF).
  • M. Baroni and S. Bernardini (eds.). 2006. Wacky! Working papers on the Web as Corpus. Bologna: GEDIT. (Webpage)
  • G. Faaß, U. Heid, and H. Schmid. 2010. Design and application of a Gold Standard for morphological analysis: SMOR in validation. In Proceedings of the seventh LREC conference , pages 803 – 810, Valetta, Malta, May 19 – 21 2010. European Language Resources Association (ELRA) (PDF).
  • G. Faaß and K. Eckart. 2013. SdeWaC - A Corpus of Parsable Sentences from the Web. Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology (GSCL) 2013. September 25-27, 2013. Darmstadt, Germany.
  • A. Ferraresi. 2007. Building a very large corpus of English obtained by Web crawling: ukWaC. Master Thesis, University of Bologna (PDF)
  • A. Ferraresi, S. Bernardini, G. Picci and M. Baroni (2010) “Web Corpora for Bilingual Lexicography: A Pilot Study of English/French Collocation Extraction and Translation”. In Xiao, R. (ed.) Using Corpora in Contrastive and Translation Studies. Newcastle: Cambridge Scholars Publishing. (Pre-print version)
  • A. Ferraresi, E. Zanchetta, M. Baroni and S. Bernardini. 2008. Introducing and evaluating ukWaC, a very large web-derived corpus of English. In S. Evert, A. Kilgarriff and S. Sharoff (eds.) Proceedings of the 4th Web as Corpus Workshop (WAC-4) – Can we beat Google?, Marrakech, 1 June 2008. (PDF) (Webpage)
publications.txt · Last modified: 2013/09/13 12:18 by eros