--- El mar, 13/7/10, Jodi Schneider <jodi.schneider@deri.org> escribió:

De: Jodi Schneider <jodi.schneider@deri.org>
Asunto: [Wiki-research-l] "Mining Wikipedia public data" notes. Do we want to document research methods & tools?
Para: "Research into Wikimedia content and communities" <wiki-research-l@lists.wikimedia.org>
Fecha: martes, 13 de julio, 2010 12:30

We had a very useful collective notetaking effort during Felipe's Wikimania session on Mining Wikipedia public data. To have a second copy, I've dumped it the contents into the Talk page for that session:
http://wikimania2010.wikimedia.org/wiki/Talk:Submissions/Mining_Wikipedia_public_data

There are several interesting parts -- including a summary of Felipe's recommendations.

Thanks, Jodi. Just uploaded the slides to Slideshare, and link on the webpage for the session.

I hope I'll find some time this week to translates the slides for an intro to pywikipediabot (tutorial-style) from my colleague emijrp, from Spanish (thanks to the CC-BY-SA 3.0 license of the original version ;-) ).

I'll post a message once I upload the sildes and the link to the page for the session.

Best,
F.


I'll paste below just one section -- about tools/best practices -- because I'd really like to see a central place to look up documentation on best practices, tools, and methodologies. It could transclude from or point to the existing documentation.

Would that be useful to anyone else? If so, this list might give a scope of the tech aspects, as a starting place. If it already exists --as an existing single point-of-entry, I'd be delighted to know that instead!

-Jodi
==========
Here's part of that sync.in sheet -- worth looking at the whole thing, at


What tools/best practices can we share/should we know about?


Tools for analytizing particular articles
http://toolserver.org/~mzmcbride/watcher/ - number of people who are watching a page
http://stats.grok.se/en/201007/ - most viewed pages, largest # of editors in a month, viewed page statistics
http://wikidashboard.parc.com visualization in place


Bots and code
http://meta.wikimedia.org/wiki/Pywikipediabot pywikipediabot - queries the Wikipedia API

Computer resources
http://toolserver.org/~daniel/ Talk to Daniel about Toolserver accounts

Tools for dealing with particular dumps
http://en.wikipedia.org/wiki/Wikipedia:Database_download - Information on downloading the database

What are these good for? (classify me)
http://meta.wikimedia.org/wiki/WikiXRay quantitative analysis tool (from Felipe Ortega et al)
http://www.cs.technion.ac.il/~gabr/resources/code/wikiprep/ - preprocessor  for XML dumps, "eliminates some information and adds other useful  information"
http://static.wikipedia.org/ - Static HTML dumps

-----Adjunto en línea a continuación-----

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l