tools

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revisionBoth sides next revision
tools [2013/03/14 10:34] – [Complete pipelines] erostools [2013/03/15 09:14] – [Boilerplate removal] eros
Line 15: Line 15:
  
   * [[http://code.google.com/p/justext/|jusText]] -- a tool for removing boilerplate content   * [[http://code.google.com/p/justext/|jusText]] -- a tool for removing boilerplate content
-  * [[http://www.nljubesic.net/resources/tools/webcontentextractor/|WebContentExtractor]] -- a tool for content extraction from web pages for building web corpora+  * [[http://www.nljubesic.net/resources/tools/webcontentextractor/|WebContentExtractor]] -- a tool for extracting content from web pages
   * the **PotaModule** (a Perl module that is intended to perform "boilerplate" stripping and other forms of HTML document filtering and extraction) is available in the BootCaT toolkit (see link above).   * the **PotaModule** (a Perl module that is intended to perform "boilerplate" stripping and other forms of HTML document filtering and extraction) is available in the BootCaT toolkit (see link above).
  
  • tools.txt
  • Last modified: 2016/02/25 15:20
  • by eros