On 11 Dec 2013, at 8:44 AM, Johannes Kroll <johannes.kroll(a)wikimedia.de> wrote:
That is close to what I had in mind for the category part, albeit it appears to be
accessible as a server, whereas a WebGraph instance is accessed as an embedded library,
which is significantly faster. The point of having the whole graph is that you can use
also the other links to make inferences to validate/patch the category hierarchy (e.g.,
people pages should have a higher percentage of links to/from people pages, so you could
use the category as a base vector and run some iterative process to catch missing items).
Ciao,
seba