On Sun, 15 Dec 2013 16:53:18 +0100
Maarten Dammers <maarten(a)mdammers.nl> wrote:
Hi Johannes,
Johannes Kroll schreef op 15-12-2013 16:27:
I would love to have some sort of dump or (even
better) a central
service I can query. It should contain for all Wikimedia projects:
* Page links (page A links to page B)
this should be in the pagelinks table in the database replica.
Maybe I wasn't clear. I know how MediaWiki works and what tables to
query [1], but it isn't designed for recursion or crawling it as a
directed graph. That really kills performance and doesn't scale at all.
You need a custom setup for that.
Yes, and that is what Catgraph is about. It is a directed graph database
made for exactly this kind of thing. We currently carry the category
links, but we could import other graphs as well, such as pagelinks. I
just wasn't sure whether you needed to do recursive queries or not. If
you do, Catgraph is the thing.
https://wikitech.wikimedia.org/wiki/Nova_Resource:Catgraph