Hi everyone, 

I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, which introduces all possible source of data and makes it easy for a newbie to understand what suits his/her needs and how to get and work with the data. This is meant to be useful to the users ( which is you), so I have a few questions to help me make it better: 

  1. I was wondering if any of you has used data from sources other than the listed below and if yes, what?  • XML dumps 
     • the API
     • the Toolserver (or it's future replacement on WMF Labs)
     • our live IRC feeds
     • our raw hourly pageview data dumps (and the rudimentary API that you can use to query them atstats.grok.se)
     • the sources listed on our (experimental) open data registry on the DataHub  http://datahub.io/group/wikimedia (includes DBpedia) 

  2. Is there any specific information that you wished you had known when you started using WMF data but is not documented online? 
  3. Do you have any datasets or tools for parsing/manipulating/visualizing data, which you think can be reused and you want to share? (Could be something you built or something you found and liked)
  4. What information should be included about each source. I am thinking about :
    1. description of the data - content, format , method of collection or how you can collect it, how often it is collected, for what period
    2. skills required to get and work with the data ( PHP, SQL, etc.)
    3. short sample 
    4. existing tools - for parsing, importing, etc. 
    5. maybe examples of projects where it was used? 
Any other comments/suggestions will be appreciated.

Thank you in advance. 

Mariya