seed_words_and_tuples

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

seed_words_and_tuples [2008/02/20 15:58] – created erosseed_words_and_tuples [2008/02/20 16:02] (current) eros
Line 3: Line 3:
 The first step in the creation of our Corpora was coming up with lists of basic words and mid-frequency words collected from other corpora. We then randomly combined these words in pairs and sent each pair as a query to Google in order to obtain seed URLs. The first step in the creation of our Corpora was coming up with lists of basic words and mid-frequency words collected from other corpora. We then randomly combined these words in pairs and sent each pair as a query to Google in order to obtain seed URLs.
  
-==== deWaC ====+==== deWaC (German) ====
  
   * {{dewac_seed_words.zip|deWaC seed words}}   * {{dewac_seed_words.zip|deWaC seed words}}
   * {{dewac_seed_pairs.zip|deWaC seed pairs}}   * {{dewac_seed_pairs.zip|deWaC seed pairs}}
  
-==== itWaC ====+==== itWaC (Italian) ====
  
   * {{itwac_seed_words.zip|itWaC seed words}}   * {{itwac_seed_words.zip|itWaC seed words}}
   * {{itwac_seed_words_pairs.zip|itWaC seed pairs}}   * {{itwac_seed_words_pairs.zip|itWaC seed pairs}}
  
-==== ukWaC ====+==== ukWaC (English) ====
  
   * {{ukwac_seed_words_bnc.zip|ukWaC seed words}} (collected from the BNC)   * {{ukwac_seed_words_bnc.zip|ukWaC seed words}} (collected from the BNC)
   * {{ukwac_seed_word_pairs.zip|ukWaC seed pairs}}   * {{ukwac_seed_word_pairs.zip|ukWaC seed pairs}}
  • seed_words_and_tuples.txt
  • Last modified: 2008/02/20 16:02
  • by eros