Hi everyone,
I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, which introduces all possible source of data and makes it easy for a newbie to understand what suits his/her needs and how to get and work with the data. This is meant to be useful to the users ( which is you), so I have a few questions to help me make it better:
- I was wondering if any of you has used data from sources other than the listed below and if yes, what? • XML dumps
• the API
• the Toolserver (or it's future replacement on WMF Labs)
• our live IRC feeds
• our raw hourly pageview data dumps (and the rudimentary API that you can use to query them at
stats.grok.se)
- Is there any specific information that you wished you had known when you started using WMF data but is not documented online?
- Do you have any datasets or tools for parsing/manipulating/visualizing data, which you think can be reused and you want to share? (Could be something you built or something you found and liked)
- What information should be included about each source. I am thinking about :
- description of the data - content, format , method of collection or how you can collect it, how often it is collected, for what period
- skills required to get and work with the data ( PHP, SQL, etc.)
- short sample
- existing tools - for parsing, importing, etc.
- maybe examples of projects where it was used?
Any other comments/suggestions will be appreciated.
Thank you in advance.
Mariya