Bilal Abdul Kader schrieb:
Greetings, We are setting up a research server at Concordia University (Canada) that is dedicated for Wikipedia. We would love to share the resources with anyone interested.
In case anyone needs help setting it up, we would love to help as well.
bilal
There's a project for a biggish research cluster for wikipedia data awaiting funding at the Syracuse University. I forwarded your mail to one of the people involved. Perhaps you can join forces.
On Mon, Mar 9, 2009 at 8:07 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Hi all, I'm not sure exactly where to raise this, so am asking here.
A researcher I have been in touch with has proposed starting a 2nd, research-oriented Wikimedia toolserver. He thinks his lab can pay for the hardware and would be willing to maintain it, if they could get help setting it up. He got this idea after a member of his research group tried (unsuccessfully so far -- no response) to get an account on the current toolserver; their Wikipedia-related research has been put on hold for a few months because of the delay. (It seems like there is a big backlog of account requests right now and only one person working on them?) This research group has done some interesting Wikipedia research to date and I expect they could do more with access to the right data.
I apologize for the delay, perhaps you can send me some detaqils in private, and I'll look at it. DaB doesn't have much time lately, and we had some major changes in infrastructure to take care of, that caused some delays.
Personally, I think a dedicated toolserver is a great idea for the research community, but I know very little about the technical issues involved and/or whether this has been proposed before. Please comment, and I can pass on replies and put the researcher in touch with the tech team if it seems like a good idea.
If it makes sense to run a separate cluster largely depends on what kind of data you need access too, and in what time frame. If you workj mustly on secondaty data like link tables, and you need the data in near-real time, use toolserver.org. That's what it's there for, and it's unlikely you can set up anything that could get the same data with low latency.
However, if you work mostly on full text, toolserver.org is not so useful anyway - there's no direct access to full page text there anyway, not to search indexes. Having a dedicated cluster for research on textual content, perhaps providing content in various pre-processed forms, would be a very good idea. This is what the project I mentioned above aims at, and I'll be happy to support this effort officially, as Wikimedia Germany's tech guy.
-- daniel