User Tools

Site Tools


Seed words and tuples

The first step in the creation of our Corpora was coming up with lists of basic words and mid-frequency words collected from other corpora. We then randomly combined these words in pairs and sent each pair as a query to Google in order to obtain seed URLs.

deWaC (German)

itWaC (Italian)

ukWaC (English)

seed_words_and_tuples.txt · Last modified: 2008/02/20 16:02 by eros