Hi Jose,
Glad you finding something to use the Wikipedia fore. A quick heads up - the text is not "copyright-free" in the sense of public domain but does have a "free" licence so you should be able to do what you want for no cost and without extra permissions as long as you attribute us when publishing. See http://www.wikipedia.org/wiki/Wikipedia:Copyrights for the full details.
Pete
-----Original Message----- From: wikitech-l-bounces@Wikipedia.org [mailto:wikitech-l-bounces@Wikipedia.org] On Behalf Of Jose Quesada Sent: 27 November 2003 22:36 To: wikitech-l@wikipedia.org Subject: [Wikitech-l] Wikipedia full dump (English) broken link?
Hi,
Here at CU we work with corpora of text to train models that 'understand' language (see, e.g., LSA.colorado.edu). We wanted to use Wikipedia to create a copyright-free corpus of text that anyone in the scientific community could use. To do that we downloaded the DB dumps a while ago ( about 2 billion words), but due to a computer problem, we lost them.
I have noticed that the link to the full english database (2280MB): http://download.wikipedia.org/archives/en/20031125_old_table.sql.bz2
doesn't work anymore; it returns a Forbidden error, says that you don't have permission to access /archives/en/20031125_old_table.sql.bz2 on this server
Could you please grant us access to the file?
Thanks a lot in advance, -Jose
wikitech-l@lists.wikimedia.org