<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="FeedCreator 1.8" -->
<?xml-stylesheet href="https://wacky.sslmit.unibo.it/lib/exe/css.php?s=feed" type="text/css"?>
<rdf:RDF
    xmlns="http://purl.org/rss/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel rdf:about="https://wacky.sslmit.unibo.it/feed.php">
        <title>WaCKy</title>
        <description></description>
        <link>https://wacky.sslmit.unibo.it/</link>
        <image rdf:resource="https://wacky.sslmit.unibo.it/lib/exe/fetch.php?media=wiki:dokuwiki.svg" />
       <dc:date>2026-05-14T13:16:23+00:00</dc:date>
        <items>
            <rdf:Seq>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=corpora&amp;rev=1383575863&amp;do=diff"/>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=download&amp;rev=1631521206&amp;do=diff"/>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=frequency_lists&amp;rev=1395913538&amp;do=diff"/>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=limine&amp;rev=1233660279&amp;do=diff"/>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=people&amp;rev=1363770132&amp;do=diff"/>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=publications&amp;rev=1657526700&amp;do=diff"/>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=seed_urls&amp;rev=1203519535&amp;do=diff"/>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=seed_words_and_tuples&amp;rev=1203519748&amp;do=diff"/>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=sidebar&amp;rev=1363253512&amp;do=diff"/>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=start&amp;rev=1670237869&amp;do=diff"/>
                <rdf:li rdf:resource="https://wacky.sslmit.unibo.it/doku.php?id=tools&amp;rev=1456410037&amp;do=diff"/>
            </rdf:Seq>
        </items>
    </channel>
    <image rdf:about="https://wacky.sslmit.unibo.it/lib/exe/fetch.php?media=wiki:dokuwiki.svg">
        <title>WaCKy</title>
        <link>https://wacky.sslmit.unibo.it/</link>
        <url>https://wacky.sslmit.unibo.it/lib/exe/fetch.php?media=wiki:dokuwiki.svg</url>
    </image>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=corpora&amp;rev=1383575863&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2013-11-04T14:37:43+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>Corpora</title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=corpora&amp;rev=1383575863&amp;do=diff</link>
        <description>Corpora

The resources below are large corpora build by downloading text from the web. See  the Publications section for further details, and the Use the corpus directly (no download necessary) section for information on how to get them:

English

	*  PukWaC: the same as ukWaC, but with a further layer of annotation added, i.e. a full dependency parse. The parsing was performed with the</description>
    </item>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=download&amp;rev=1631521206&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-09-13T08:20:06+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>Use the corpus directly (no download necessary)</title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=download&amp;rev=1631521206&amp;do=diff</link>
        <description>Use the corpus directly (no download necessary)

	*  The wacky corpora are available on our official corpus repository here: &lt;http://corpora.dipintra.it&gt;

Other free web interfaces:

	*  the Jožef Stefan Institute hosts a web interface where many of our corpora can be used directly for free:</description>
    </item>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=frequency_lists&amp;rev=1395913538&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2014-03-27T09:45:38+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>Frequency lists</title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=frequency_lists&amp;rev=1395913538&amp;do=diff</link>
        <description>Frequency lists

Frequency lists extracted from the WaCky corpora. Lists of words and lemmas are provided, sorted by frequency. All the lists are in '.7z' compressed format.

	*  Unigram lists. These are the complete lists, i.e. we did not perform any post-processing on them.</description>
    </item>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=limine&amp;rev=1233660279&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-02-03T11:24:39+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>LiMiNe</title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=limine&amp;rev=1233660279&amp;do=diff</link>
        <description>LiMiNe

The LiMiNe (Linguistic Mining of the Net) project intends to set up a European network of linguists and Information Technology specialists for the development of methodologies and resources for the assembly, annotation and indexing of several language-specific subsets of the web to be used in the analysis and teaching of (general and specialised) languages.</description>
    </item>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=people&amp;rev=1363770132&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2013-03-20T09:02:12+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>People</title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=people&amp;rev=1363770132&amp;do=diff</link>
        <description>People

Contacts

Contact the Wacky Bunch at &lt;wacky@sslmit.unibo.it&gt;

Members
 Name  Current affiliation  Giuseppe Attardi  University of Pisa  Marco Baroni  University of Trento  Silvia Bernardini  University of Bologna (Forlì)  Gabriele “Bilo” Carioli  University of Bologna (Forlì)</description>
    </item>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=publications&amp;rev=1657526700&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2022-07-11T08:05:00+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>Publications</title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=publications&amp;rev=1657526700&amp;do=diff</link>
        <description>Publications

	*  M. Baroni, S. Bernardini, A. Ferraresi and E. Zanchetta. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation 43(3): 209-226 ([PDF]).

	*  M. Baroni and S. Bernardini (eds.). 2006.</description>
    </item>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=seed_urls&amp;rev=1203519535&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-02-20T14:58:55+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>Seed URLs</title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=seed_urls&amp;rev=1203519535&amp;do=diff</link>
        <description>Seed URLs

The URLs returned by our queries to Google for the word pairs above were used to initiate the crawls.

	*  [deWaC seed URLs]
	*  [itWaC seed URLs]
	*  [ukWaC seed URLs]</description>
    </item>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=seed_words_and_tuples&amp;rev=1203519748&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-02-20T15:02:28+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>Seed words and tuples</title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=seed_words_and_tuples&amp;rev=1203519748&amp;do=diff</link>
        <description>Seed words and tuples

The first step in the creation of our Corpora was coming up with lists of basic words and mid-frequency words collected from other corpora. We then randomly combined these words in pairs and sent each pair as a query to Google in order to obtain seed URLs.</description>
    </item>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=sidebar&amp;rev=1363253512&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2013-03-14T09:31:52+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title></title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=sidebar&amp;rev=1363253512&amp;do=diff</link>
        <description>*  Home
	*  Corpora
	*  Download/Use
	*  Tools
	*  Publications
	*  People</description>
    </item>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=start&amp;rev=1670237869&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2022-12-05T10:57:49+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>WaCky - The Web-As-Corpus Kool Yinitiative</title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=start&amp;rev=1670237869&amp;do=diff</link>
        <description>WaCky - The Web-As-Corpus Kool Yinitiative

Welcome to WaCky!

We are a community of linguists and information technology specialists who got together to develop a set of tools (and interfaces to existing tools) that will allow linguists to crawl a section of the web, process the data, index and search them.</description>
    </item>
    <item rdf:about="https://wacky.sslmit.unibo.it/doku.php?id=tools&amp;rev=1456410037&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2016-02-25T14:20:37+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>Tools</title>
        <link>https://wacky.sslmit.unibo.it/doku.php?id=tools&amp;rev=1456410037&amp;do=diff</link>
        <description>Tools

This is an incomplete list of tools you can use to build corpora from the web.

Complete pipelines

	*  BootCaT -- bootstrap specialized corpora and terms from the web

De-duplication

	*  [Shared ngram collector] -- Perl script useful for near-duplicate detection
	*  Onion -- a tool for removing duplicate parts from large collections of texts.</description>
    </item>
</rdf:RDF>
