Re: [Wikitech-l] research-oriented toolserver?

10 Mar 2009


      On Tue, Mar 10, 2009 at 2:18 PM, Daniel Kinzler daniel@brightbyte.de wrote:
...
Robert Rohde schrieb:
...
The converse of this is that some recognized experts would probably
prefer to administer their own server/cluster rather than relying on
some random guy with Wikimedia DE (or wherever) to get things done.
An academic institution may also get a serious research grant for this - that
would be more complicated if the money would be handeled via the german chapter.
Though it's something we are, of course, also interested in.
Basically, if we could all work on making the toolserver THE ONE PLACE for
working with wikipedia's data, that would be perfect. If, for some reason, it
makes sense to build a separate cluster, I propose to give it a distict purpose
and profile: let it provide facilities for fulltext research, with low priority
for the update latency, and high priority of having fulltext in various forms,
with search indexes, word lists, and all the fun.
Personally I would favor a physically distinct cluster (regardless of
who administers it) more or less with the focus you describe.  In
particular, I think it is useful to separate "tools" from "analysis".
A "tool" aims to provide useful information in near realtime based on
specific and focused parameters.  By contrast, "analysis" often
involves running some process systematically through a very large
portion of the data with the expectation that it will take a while
(for example, I've used dumps to perform large statistical analyses
where the processing code might take 24 hours when run against the
full edit history of a large wiki.)  "Tools" need high availability
and low lag relative to the live site, but "analysis" doesn't care if
it gets out of date and should use scheduling etc. to balance large
loads.
-Robert Rohde

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] research-oriented toolserver?