Hi all, we produced a prototype of an editor-editor interaction network visualization for individual articles, based on the word/tokens deleted and reintroduced by editors. It will be presented as a demo at the WWW conference this year [1], but we would love
to also get some feedback on it from this list. It's in an early stage and pretty slow when loading up, so have patience when you try it out here: http://km.aifb.kit.edu/sites/whovis/index.html,
and be sure to read the "how to" section on the site. Alternatively you can watch the (semi-professional) screencast I did :P, it explains most of the functions.
The (disagreement) interactions are based on a extended version of the extraction of authorship we do with wikiwho [2], and the graph drawing is done almost exactly after the nice method proposed by Brandes et al. [3] . The code can be found at github,
both for the interaction-extraction extension of wikiwho [4] and the visualization itself [5], which basically produces an json output for feeding the D3 visualization libraries we use. We have yet to generate output for more articles, so far we only show
a handful for demonstration purposes. The whole thing also fits nicely (and was supposed to go along) with the IEG proposal that Pine had started on editor interaction [6] .
word provenance/authorship API prototype:
Also, we have worked a bit on our early prototype for an API for word provenance/authorship:
You can get word/token-wise information from which revision what content originated (and thereby which editor originally authored the word) at
(<ARTICLENAME> -> name of the article in ns:0, in the english wikipedia, <REV_ID> -> rev_id of that article for which you want the authorship information, format is currently only json)
Output format is currently:
{"tokens": [{"token": "<FIRST TOKEN IN THE WIKI MARKUP TEXT>", "author_name": "<NAME OF AUTHOR OF THE TOKEN>", "rev_id": "<REV_ID WHEN TOKEN WAS FIRST ADDED>"},
{"token": "<SECOND TOKEN IN THE WIKI MARKUP TEXT>", "author_name": "<NAME OF AUTHOR OF THE TOKEN>", "rev_id": "<REV_ID WHEN TOKEN WAS FIRST ADDED>"}, {"token": "<THIRD
TOKEN …
… ], "message": null, "success": "true", "revision": {"article": "<NAME OF REQUESTED ARTICLE>", "time": "<TIMESTAMP OF REQUESTED REV_ID>", "reviid": <REQUESTED
REV_ID>, "author": "<AUTHOR OF REQUESTED REV_ID>"}}
DISCLAIMER: there are problems with getting/processing the XML for larger articles right now, so don't be surprised if that gives you an error sometimes (i.e. querying "Barack Obama" for instance and similar sizes will *not* succeed for higher revision
numbers). Also, we are working on the speed and providing more precomputed articles (right now almost all are computed on request, although we save intermediary results). Still, for most articles it works fine and the output has been tested for accuracy (cf.
[2]).
At some point in the future, this API will also be able to deliver the interaction data that the visualization is build on.
I'm looking forward to your feedback :)
Cheers,
Fabian
Research Associate