tools

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
tools [2013/03/14 10:34] erostools [2013/03/20 09:40] – [Tools] eros
Line 1: Line 1:
 ====== Tools ====== ====== Tools ======
  
-This is an incomplete list of tools you can use to build corpora from the web.+This is an **incomplete** list of tools you can use to build corpora from the web.
  
 ===== Complete pipelines ===== ===== Complete pipelines =====
  
-  * [[http://bootcat.sslmit.unibo.it/|BootCaT toolkit]] -- bootstrap specialized corpora and terms from the web+  * [[http://bootcat.sslmit.unibo.it/|BootCaT]] -- bootstrap specialized corpora and terms from the web
  
 ===== De-duplication ===== ===== De-duplication =====
Line 15: Line 15:
  
   * [[http://code.google.com/p/justext/|jusText]] -- a tool for removing boilerplate content   * [[http://code.google.com/p/justext/|jusText]] -- a tool for removing boilerplate content
-  * [[http://www.nljubesic.net/resources/tools/webcontentextractor/|WebContentExtractor]] -- a tool for content extraction from web pages for building web corpora+  * [[http://www.nljubesic.net/resources/tools/webcontentextractor/|WebContentExtractor]] -- a tool for extracting content from web pages
   * the **PotaModule** (a Perl module that is intended to perform "boilerplate" stripping and other forms of HTML document filtering and extraction) is available in the BootCaT toolkit (see link above).   * the **PotaModule** (a Perl module that is intended to perform "boilerplate" stripping and other forms of HTML document filtering and extraction) is available in the BootCaT toolkit (see link above).
  
  • tools.txt
  • Last modified: 2016/02/25 15:20
  • by eros