Dear Wikiteam, Guy Chapman requested that I post to the mailing list to ask how we can proceed to getting a copy of Wikipedia so that we can offer it as a database in our free search service, in response to the request in the following paragraph. He made me aware of its size, but that is not an issue. I would like to obtain a copy and then establish a routine for automated synced downloads like we do for the other databases we have in our system. I have had several requests to add Wikipedia to our eTBLAST text similarity search engine. This is to improve reference finding as well as novelty assessment. Our search tool is widely used, widely published and is free. Please see etblast.org or http://en.wikipedia.org/wiki/ETBLAST. I would like to create a searchable copy of Wikipedia locally with links back to Wikipedia for hits, and of course acknowledge Wikimedia. We do this for several open text datasets and are prepared to keep a local, synced copy of Wikipedia, if you are interested. I am certain that our mutual users would like and benefit from our working together.
Cheers, and thank you, Skip
----- Original Message ----- From: "Wikipedia information team" info-en@wikimedia.org To: "Skip Garner" garner@vbi.vt.edu Cc: "Dominik L. Borkowski" dom@vbi.vt.edu, "Johnny Sun" szhaohui@vbi.vt.edu Sent: Wednesday, December 1, 2010 9:43:25 AM Subject: Re: [Ticket#2010112810016598] I would like to provide a different search engine for Wikimedia
Dear Skip Garner,
Thank you for your email. Our response follows your message.
11/29/2010 16:23 - Skip Garner wrote:
Guy, Thank you for the information. I would like to move forward on this, for I
think it will be of mutual value. The size of the database is not an issue, and we are always expanding our storage and serving capabilities. We regularly work with data in the 100's of T in size. One issue would be getting the first copy, but we could probably handle that by fed-x.
Can you tell me how we can proceed?
Cheers, Skip
The best bet is probably to email the wikitech mailing list, which is where the devs hang out.
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
They will have the best idea of the practicalities.
Yours sincerely, Guy Chapman