--- El mar, 13/7/10, Jodi Schneider <jodi.schneider(a)deri.org> escribió:
De: Jodi Schneider <jodi.schneider(a)deri.org>
Asunto: [Wiki-research-l] "Mining Wikipedia public data" notes. Do we want to
document research methods & tools?
Para: "Research into Wikimedia content and communities"
<wiki-research-l(a)lists.wikimedia.org>
Fecha: martes, 13 de julio, 2010 12:30
We had a very useful collective notetaking effort during Felipe's Wikimania session on
Mining Wikipedia public data. To have a second copy, I've dumped it the contents into
the Talk page for that
session:http://wikimania2010.wikimedia.org/wiki/Talk:Submissions/Mining_Wik…
There are several interesting parts -- including a summary of Felipe's
recommendations.
Thanks, Jodi. Just uploaded the slides to Slideshare, and link on the webpage for the
session.
I hope I'll find some time this week to translates the slides for an intro to
pywikipediabot (tutorial-style) from my colleague emijrp, from Spanish (thanks to the
CC-BY-SA 3.0 license of the original version ;-) ).
I'll post a message once I upload the sildes and the link to the page for the
session.
Best,
F.
I'll paste below just one section -- about tools/best practices -- because I'd
really like to see a central place to look up documentation on best practices, tools, and
methodologies. It could transclude from or point to the existing documentation.
Would that be useful to anyone else? If so, this list might give a scope of the tech
aspects, as a starting place. If it already exists --as an existing single point-of-entry,
I'd be delighted to know that instead!
-Jodi==========Here's part of that sync.in sheet -- worth looking at the whole thing,
athttp://sync.in/60kOfEwBHA
What tools/best practices can we share/should we know about?
Tools for analytizing particular
articleshttp://toolserver.org/~daniel/WikiSense/Contributors.php - number of
contributors
http://toolserver.org/~mzmcbride/watcher/ - number of people who are watching
a pagehttp://stats.grok.se/en/201007/ - most viewed pages, largest # of editors in a
month, viewed page
statisticshttp://en.wikichecker.com/article/http://wikidashboard.parc.com visualization in
place
Bots and
codehttp://meta.wikimedia.org/wiki/Pywikipediabot pywikipediabot - queries the
Wikipedia API
Computer
resourceshttp://toolserver.org/~daniel/ Talk to Daniel about Toolserver accounts
Tools for dealing with particular
dumpshttp://en.wikipedia.org/wiki/Wikipedia:Database_download - Information on downloading
the database
What are these good for? (classify
me)http://meta.wikimedia.org/wiki/WikiXRay quantitative
analysis tool (from Felipe Ortega et
al)http://meta.wikimedia.org/wiki/User:Micke/WikiFind search tool for database
dumpshttp://www.cs.technion.ac.il/~gabr/resources/code/wikiprep/ - preprocessor for XML
dumps, "eliminates some information and adds other useful
information"http://www.mediawiki.org/wiki/Alternative_parsers - List of
parsershttp://static.wikipedia.org/ - Static HTML dumps
-----Adjunto en línea a continuación-----
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l