Dear Skip,
You can always use the different dump files to host a local version of
Wikipedia. These dump files are being available at
. However, at this moment there are some
hardware issues and the site is currently not available. Given the
task, I think that the
[language-code][wikiproject]-pages-meta-current.xml.bz2 are the most
interesting files.
You can find a complete dump of August 2009 as part of Amazon's AWS
public datasets at
.
I have posted a step-by-step tutorial on Wiki research mailing list
explaining how to get access to those files.
Best,
Diederik
On Wed, Dec 1, 2010 at 11:35 AM, Skip Garner <garner(a)vbi.vt.edu> wrote:
Dear Wikiteam,
Guy Chapman requested that I post to the mailing list to ask how we can proceed to
getting a copy of Wikipedia so that we can offer it as a database in our free search
service, in response to the request in the following paragraph. He made me aware of its
size, but that is not an issue. I would like to obtain a copy and then establish a
routine for automated synced downloads like we do for the other databases we have in our
system.
I have had several requests to add Wikipedia to our eTBLAST text similarity search
engine. This is to improve reference finding as well as novelty assessment. Our search
tool is widely used, widely published and is free. Please see
etblast.org or
http://en.wikipedia.org/wiki/ETBLAST. I would like to create a searchable copy of
Wikipedia locally with links back to Wikipedia for hits, and of course acknowledge
Wikimedia. We do this for several open text datasets and are prepared to keep a local,
synced copy of Wikipedia, if you are interested. I am certain that our mutual users would
like and benefit from our working together.
Cheers, and thank you,
Skip
----- Original Message -----
From: "Wikipedia information team" <info-en(a)wikimedia.org>
To: "Skip Garner" <garner(a)vbi.vt.edu>
Cc: "Dominik L. Borkowski" <dom(a)vbi.vt.edu>du>, "Johnny Sun"
<szhaohui(a)vbi.vt.edu>
Sent: Wednesday, December 1, 2010 9:43:25 AM
Subject: Re: [Ticket#2010112810016598] I would like to provide a different search engine
for Wikimedia
Dear Skip Garner,
Thank you for your email. Our response follows your message.
11/29/2010 16:23 - Skip Garner wrote:
Guy,
Thank you for the information. I would like to move forward on this, for I
think it will be of mutual value. The size of the database is not an issue, and
we are always expanding our storage and serving capabilities. We regularly work
with data in the 100's of T in size. One issue would be getting the first copy,
but we could probably handle that by fed-x.
Can you tell me how we can proceed?
Cheers,
Skip
The best bet is probably to email the wikitech mailing list, which is where the
devs hang out.
<https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
They will have the best idea of the practicalities.
Yours sincerely,
Guy Chapman
--
Wikipedia -
http://en.wikipedia.org
---
Disclaimer: all mail to this address is answered by volunteers, and responses are
not to be considered an official statement of the Wikimedia Foundation. For
official correspondence, please contact the Wikimedia Foundation by certified mail
at the address listed on
http://www.wikimediafoundation.org
--
Harold "Skip" Garner
Executive Director
Virginia Bioinformatics Institute
Virginia Tech
Washington Street (0477)
Blacksburg, VA 24061
http://www.vbi.vt.edu
Phone: 540.231.2582
Fax: 540.231.1388
Assistant: Renee Nester
renee(a)vbi.vt.edu
540.231.2582
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
<a href="http://about.me/diederik">Check out my about.me
profile!</a>