Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
tools [2013/03/14 10:34] – eros | tools [2016/02/25 15:20] (current) – [Boilerplate removal] eros | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Tools ====== | ====== Tools ====== | ||
- | This is an incomplete list of tools you can use to build corpora from the web. | + | This is an **incomplete** list of tools you can use to build corpora from the web. |
===== Complete pipelines ===== | ===== Complete pipelines ===== | ||
- | * [[http:// | + | * [[http:// |
===== De-duplication ===== | ===== De-duplication ===== | ||
Line 15: | Line 15: | ||
* [[http:// | * [[http:// | ||
- | * [[http://www.nljubesic.net/resources/tools/webcontentextractor/|WebContentExtractor]] -- a tool for content | + | * [[http://metashare.elda.org/repository/browse/web-content-extractor/ |
* the **PotaModule** (a Perl module that is intended to perform " | * the **PotaModule** (a Perl module that is intended to perform " | ||
- |