On 15 Dec 2013, at 3:30 AM, Maarten Dammers <maarten(a)mdammers.nl> wrote:
I would love to have some sort of dump or (even
better) a central service I can query. It should contain for all Wikimedia projects:
* Page links (page A links to page B)
* Category links (page A is in category C)
* Image links (page A uses image I)
* Interlanguage links (page A in language en links to page A' in language nl)
* Interproject links (page A in the English Wikipedia links to page A' on Wikimedia
Commons)
And to make it really complete:
* Wikidata claims (item A has a claim pointing to item B)
Well, somehow that would be a more interesting graph to build--all pages, all languages,
all images, all categories, all together. Probably in the 100M pages/5B arcs range. Having
all languages together would help to make inference/learning easier by working on the
English part and then propagating the results. At that point, actually, compression would
be essential in making it an in-core data structure.
Ciao,
seba