Re: [Analytics] Distributing an official graph

17 Dec 2013

On 17 Dec 2013, at 9:02 AM, Johannes Kroll &lt;johannes.kroll(a)wikimedia.de&gt; wrote:

...
  We store page_ids only, or any other integer IDs.
Tools using it
 fetch all other data from SQL. This makes sense for Tools on Labs for
 example, which have access to the DB replica anyway. We don't compress
 anything which makes it quite fast. 
Compression and speed are not one against another--quite the contrary. A standard
compression format from WebGraph delivers an edge in ~50ns. Frankly, any service will
requires orders of magnitude more. Do you have any timings to compare?

...
  It isn't a goal, the service already exists. The
data you get is fresh,
 automatically updated every hour or so, unlike a graph that you would 
This is exactly what you don't want for research purposes: moving targets. You need a
dataset, downloaded in some point in time (like Wikipedia dumps) that other people can use
to replicate or results or improve them. Anything that is updated every hour is unusable
for that purpose. It's a just a different goal.

Once you nail down your algorithms it might be, of course, a good idea to run them on
fresh data, but research requires replicability.

...
  download. It's as easy to use as any other
software library that you
 pull into your script with "import foo". As to speed, most results are
 pretty much instant. Try it: 
"Instant" has for me no meaning. Can you quantify?

Ciao,

					seba

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Distributing an official graph