On 11/10/12 17:46, Strainu wrote:
I did something last year for exporting the files from WLMRO to Europeana: http://code.google.com/p/wikiro/source/browse/trunk/robots/python/pywikipedi... It was done very quickly and it probably has some bugs. Platonides also has something made for these statistics: http://toolserver.org/~platonides/wlm/users.php , although I couldn't tell you where to get the source from.
It's basically a regex to extract the user from the author field... plus 120 special cases of people who don't put a link to their user page or use a template, plus 4 special cases for users which use custom templates instead of {{information}}, plus another for Talmoryair which uses {{Artwork}} instead of {{Information}}. OTOH it has parsed -hopefully quite correctly- 363k images from 15000 authors.
We can make it a general library if you want. I think the wrong use that happened to be most common were people who wanted to change the attribution to their username to their name, so they changed [[User:Foo|Foo]] to [[User:John Doe|John Doe]]... which is completely wrong. Specially when there was an account named «John Doe». In some cases, it was clear that when user JDoe changed the author field to «John Doe», he refered to himself. But if Guy85 put «John Doe», is it his real name, a friend, or some random guy? (I was not just caring about how they wanted to be credited but also who was being credited)