[2] https://github.com/maribelacosta/wikiwho

On 14.12.2014, at 07:23, Jeremy Baron <jeremy@tuxmachine.com> wrote:

On Dec 13, 2014 12:33 PM, "Aaron Halfaker" <ahalfaker@wikimedia.org> wrote:
> 1. It turns out that generating diffs is computationally complex, so generating them in real time is slow and lame. I'm working to generate all diffs historically using Hadoop and then have a live system listening to recent changes to keep the data up-to-date[2].

IIRC Mako does that in ~4 hours (maybe outdated and takes longer now) for all enwiki diffs for all time. (don't remember if this is namespace limited) But also using an extraordinary amount of RAM. i.e. hundreds of GB

AIUI, there's no dynamic memory allocation. revisions are loaded into fixed-size buffers larger than the largest revision.

https://github.com/makoshark/wikiq

-Jeremy
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Cheers,
Fabian

--
Fabian Flöck

Research Associate

Computational Social Science department @GESIS
Unter Sachsenhausen 6-8, 50667 Cologne, Germany
Tel: + 49 (0) 221-47694-208
fabian.floeck@gesis.org

www.gesis.org
www.facebook.com/gesis.org