Hi, Maria! Then, answering your questions:

What info are you going to display?
The idea is to combine visualizations and material of reference on how to manipulate dumps, use API, with a main focus on Portuguese community, and serving as an informative portal for hackers and general people. I still designing the first prototype and join reference material.

Where are you going to host it? Is it going to be Portuguese?
Following the community desire, it should be hosted on Portuguese Wikipedia, with a branch/mirror on Meta. The first language may be Portuguese, but there's no impediment to add another one.

That is something I want ask you: where do you plan to host all these documentation? Is there any propose to include non-English content? I'd like to help, as I said before.

Are there any sources you have found that are not on the list in the email and the DataHub?
No, I think these lists are well filled by now.


Ziko, thank you for the link. This links will definitely be useful for researchers especially to give them a basic understanding of the functioning of WMF. I believe it should be added as related info but once it progresses some.


On Sun, Mar 3, 2013 at 8:06 PM, Ziko van Dijk <zvandijk@gmail.com> wrote:
Maybe this is related: on meta wiki we try to instigate a collection of information about the WM movement.
Kind regards

Am Mittwoch, 27. Februar 2013 schrieb Maria Miteva :

Hi everyone, 

I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, which introduces all possible source of data and makes it easy for a newbie to understand what suits his/her needs and how to get and work with the data. This is meant to be useful to the users ( which is you), so I have a few questions to help me make it better: 

  1. I was wondering if any of you has used data from sources other than the listed below and if yes, what?  • XML dumps 
     • the API
     • the Toolserver (or it's future replacement on WMF Labs)
     • our live IRC feeds
     • our raw hourly pageview data dumps (and the rudimentary API that you can use to query them atstats.grok.se)
     • the sources listed on our (experimental) open data registry on the DataHub  http://datahub.io/group/wikimedia (includes DBpedia) 

  2. Is there any specific information that you wished you had known when you started using WMF data but is not documented online? 
  3. Do you have any datasets or tools for parsing/manipulating/visualizing data, which you think can be reused and you want to share? (Could be something you built or something you found and liked)
  4. What information should be included about each source. I am thinking about :
    1. description of the data - content, format , method of collection or how you can collect it, how often it is collected, for what period
    2. skills required to get and work with the data ( PHP, SQL, etc.)
    3. short sample 
    4. existing tools - for parsing, importing, etc. 
    5. maybe examples of projects where it was used? 
Any other comments/suggestions will be appreciated.

Thank you in advance. 


Wiki-research-l mailing list

Wiki-research-l mailing list