We had a very useful collective notetaking effort during Felipe's Wikimania session on Mining Wikipedia public data. To have a second copy, I've dumped it the contents into the Talk page for that session: http://wikimania2010.wikimedia.org/wiki/Talk:Submissions/Mining_Wikipedia_pu...
There are several interesting parts -- including a summary of Felipe's recommendations.
I'll paste below just one section -- about tools/best practices -- because I'd really like to see a central place to look up documentation on best practices, tools, and methodologies. It could transclude from or point to the existing documentation.
Would that be useful to anyone else? If so, this list might give a scope of the tech aspects, as a starting place. If it already exists --as an existing single point-of-entry, I'd be delighted to know that instead!
-Jodi ========== Here's part of that sync.in sheet -- worth looking at the whole thing, at http://sync.in/60kOfEwBHA
What tools/best practices can we share/should we know about?
Tools for analytizing particular articles http://toolserver.org/~daniel/WikiSense/Contributors.php - number of contributors http://toolserver.org/~mzmcbride/watcher/ - number of people who are watching a page http://stats.grok.se/en/201007/ - most viewed pages, largest # of editors in a month, viewed page statistics http://en.wikichecker.com/article/ http://wikidashboard.parc.com visualization in place
Bots and code http://meta.wikimedia.org/wiki/Pywikipediabot pywikipediabot - queries the Wikipedia API
Computer resources http://toolserver.org/~daniel/ Talk to Daniel about Toolserver accounts
Tools for dealing with particular dumps http://en.wikipedia.org/wiki/Wikipedia:Database_download - Information on downloading the database
What are these good for? (classify me) http://meta.wikimedia.org/wiki/WikiXRay quantitative analysis tool (from Felipe Ortega et al) http://meta.wikimedia.org/wiki/User:Micke/WikiFind search tool for database dumps http://www.cs.technion.ac.il/~gabr/resources/code/wikiprep/ - preprocessor for XML dumps, "eliminates some information and adds other useful information" http://www.mediawiki.org/wiki/Alternative_parsers - List of parsers http://static.wikipedia.org/ - Static HTML dumps
Some time ago - around 2007/2008 - I collected related tools at http://en.wikipedia.org/wiki/Wikipedia:Researching_Wikipedia
I missed the WikiSym session, but I think merging the notes from it with that page would be useful.
Hi Piotr,
Thanks for useful link and I do support your idea to merge everything in one place.
But if so what about * http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Wikidemia * http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Editing_trends
... and I presume that there might be some more places...
Regards,
Pavlo Shevelo
On Tue, Jul 13, 2010 at 5:31 PM, Piotr Konieczny piokon@post.pl wrote:
Some time ago - around 2007/2008 - I collected related tools at http://en.wikipedia.org/wiki/Wikipedia:Researching_Wikipedia
I missed the WikiSym session, but I think merging the notes from it with that page would be useful.
-- Piotr Konieczny
--- El mar, 13/7/10, Jodi Schneider jodi.schneider@deri.org escribió:
De: Jodi Schneider jodi.schneider@deri.org Asunto: [Wiki-research-l] "Mining Wikipedia public data" notes. Do we want to document research methods & tools? Para: "Research into Wikimedia content and communities" wiki-research-l@lists.wikimedia.org Fecha: martes, 13 de julio, 2010 12:30
We had a very useful collective notetaking effort during Felipe's Wikimania session on Mining Wikipedia public data. To have a second copy, I've dumped it the contents into the Talk page for that session:http://wikimania2010.wikimedia.org/wiki/Talk:Submissions/Mining_Wikipedia_pu... There are several interesting parts -- including a summary of Felipe's recommendations.
Thanks, Jodi. Just uploaded the slides to Slideshare, and link on the webpage for the session.
I hope I'll find some time this week to translates the slides for an intro to pywikipediabot (tutorial-style) from my colleague emijrp, from Spanish (thanks to the CC-BY-SA 3.0 license of the original version ;-) ).
I'll post a message once I upload the sildes and the link to the page for the session.
Best, F.
I'll paste below just one section -- about tools/best practices -- because I'd really like to see a central place to look up documentation on best practices, tools, and methodologies. It could transclude from or point to the existing documentation. Would that be useful to anyone else? If so, this list might give a scope of the tech aspects, as a starting place. If it already exists --as an existing single point-of-entry, I'd be delighted to know that instead! -Jodi==========Here's part of that sync.in sheet -- worth looking at the whole thing, athttp://sync.in/60kOfEwBHA
What tools/best practices can we share/should we know about?
Tools for analytizing particular articleshttp://toolserver.org/~daniel/WikiSense/Contributors.php%C2%A0- number of contributors http://toolserver.org/~mzmcbride/watcher/%C2%A0- number of people who are watching a pagehttp://stats.grok.se/en/201007/%C2%A0- most viewed pages, largest # of editors in a month, viewed page statisticshttp://en.wikichecker.com/article/http://wikidashboard.parc.com%C2%A0visuali... in place
Bots and codehttp://meta.wikimedia.org/wiki/Pywikipediabot%C2%A0pywikipediabot - queries the Wikipedia API Computer resourceshttp://toolserver.org/~daniel/%C2%A0Talk to Daniel about Toolserver accounts Tools for dealing with particular dumpshttp://en.wikipedia.org/wiki/Wikipedia:Database_download%C2%A0- Information on downloading the database What are these good for? (classify me)http://meta.wikimedia.org/wiki/WikiXRay%C2%A0quantitative analysis tool (from Felipe Ortega et al)http://meta.wikimedia.org/wiki/User:Micke/WikiFind%C2%A0search tool for database dumpshttp://www.cs.technion.ac.il/~gabr/resources/code/wikiprep/%C2%A0- preprocessor for XML dumps, "eliminates some information and adds other useful information"http://www.mediawiki.org/wiki/Alternative_parsers%C2%A0- List of parsershttp://static.wikipedia.org/%C2%A0- Static HTML dumps -----Adjunto en línea a continuación-----
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Jodi,
A lot of thanks for you work.
Would that be useful to anyone else?
Sure! I was just going to ask on the list about such stuff when your mail arrived.
Regards,
Pavlo Shevelo
On Tue, Jul 13, 2010 at 1:30 PM, Jodi Schneider jodi.schneider@deri.org wrote:
We had a very useful collective notetaking effort during Felipe's Wikimania session on Mining Wikipedia public data. To have a second copy, I've dumped it the contents into the Talk page for that session: http://wikimania2010.wikimedia.org/wiki/Talk:Submissions/Mining_Wikipedia_pu... There are several interesting parts -- including a summary of Felipe's recommendations. I'll paste below just one section -- about tools/best practices -- because I'd really like to see a central place to look up documentation on best practices, tools, and methodologies. It could transclude from or point to the existing documentation. Would that be useful to anyone else? If so, this list might give a scope of the tech aspects, as a starting place. If it already exists --as an existing single point-of-entry, I'd be delighted to know that instead!
-Jodi
Here's part of that sync.in sheet -- worth looking at the whole thing, at http://sync.in/60kOfEwBHA
What tools/best practices can we share/should we know about?
Tools for analytizing particular articles http://toolserver.org/~daniel/WikiSense/Contributors.php%C2%A0- number of contributors http://toolserver.org/~mzmcbride/watcher/%C2%A0- number of people who are watching a page http://stats.grok.se/en/201007/%C2%A0- most viewed pages, largest # of editors in a month, viewed page statistics http://en.wikichecker.com/article/ http://wikidashboard.parc.com%C2%A0visualization in place
Bots and code http://meta.wikimedia.org/wiki/Pywikipediabot%C2%A0pywikipediabot - queries the Wikipedia API Computer resources http://toolserver.org/~daniel/%C2%A0Talk to Daniel about Toolserver accounts Tools for dealing with particular dumps http://en.wikipedia.org/wiki/Wikipedia:Database_download%C2%A0- Information on downloading the database What are these good for? (classify me) http://meta.wikimedia.org/wiki/WikiXRay%C2%A0quantitative analysis tool (from Felipe Ortega et al) http://meta.wikimedia.org/wiki/User:Micke/WikiFind%C2%A0search tool for database dumps http://www.cs.technion.ac.il/~gabr/resources/code/wikiprep/%C2%A0- preprocessor for XML dumps, "eliminates some information and adds other useful information" http://www.mediawiki.org/wiki/Alternative_parsers%C2%A0- List of parsers http://static.wikipedia.org/%C2%A0- Static HTML dumps _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org