Hi everyone,
I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, which introduces all possible source of data and makes it easy for a newbie to understand what suits his/her needs and how to get and work with the data. This is meant to be useful to the users ( which is you), so I have a few questions to help me make it better:
1. I was wondering if any of you has used data from sources other than the listed below and if yes, what? • XML dumps • the API • the Toolserver (or it's future replacement on WMF Labs) • our live IRC feeds • our raw hourly pageview data dumps (and the rudimentary API that you can use to query them atstats.grok.se) • the sources listed on our (experimental) open data registry on the DataHub http://datahub.io/group/wikimedia (includes DBpedia)
2. Is there any specific information that you wished you had known when you started using WMF data but is not documented online? 3. Do you have any datasets or tools for parsing/manipulating/visualizing data, which you think can be reused and you want to share? (Could be something you built or something you found and liked) 4. What information should be included about each source. I am thinking about : 1. description of the data - content, format , method of collection or how you can collect it, how often it is collected, for what period 2. skills required to get and work with the data ( PHP, SQL, etc.) 3. short sample 4. existing tools - for parsing, importing, etc. 5. maybe examples of projects where it was used?
Any other comments/suggestions will be appreciated.
Thank you in advance.
Mariya
Maria Miteva, 27/02/2013 16:30:
Hi everyone,
I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, [...]
So it will be integrated with https://www.mediawiki.org/wiki/How_to_contribute per https://bugzilla.wikimedia.org/show_bug.cgi?id=33464 ?
Nemo
Hi Maria!
That's an awesome idea! I'm actually working in a kind of portal like that, where researchers and hackers will find "easily" stuff about data, api, and metrics from Wikimedia projects, focused on Portuguese community, but not only. It's called "Data Portal", and is a prototype by now.
How can I help you? Jonas
On 27-02-2013 12:30, Maria Miteva wrote:
Hi everyone,
I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, which introduces all possible source of data and makes it easy for a newbie to understand what suits his/her needs and how to get and work with the data. This is meant to be useful to the users ( which is you), so I have a few questions to help me make it better:
I was wondering if any of you has used data from sources other than the listed below and if yes, what? • XML dumps • the API • the Toolserver (or it's future replacement on WMF Labs) • our live IRC feeds • our raw hourly pageview data dumps (and the rudimentary API that you can use to query them atstats.grok.se http://stats.grok.se/) • the sources listed on our (experimental) open data registry on the DataHub http://datahub.io/group/wikimedia (includes DBpedia)
Is there any specific information that you wished you had known when you started using WMF data but is not documented online?
Do you have any datasets or tools for parsing/manipulating/visualizing data, which you think can be reused and you want to share? (Could be something you built or something you found and liked)
What information should be included about each source. I am thinking about :
- description of the data - content, format , method of collection or how you can collect it, how often it is collected, for what period
- skills required to get and work with the data ( PHP, SQL, etc.)
- short sample
- existing tools - for parsing, importing, etc.
- maybe examples of projects where it was used?
Any other comments/suggestions will be appreciated.
Thank you in advance.
Mariya
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hello, Maybe this is related: on meta wiki we try to instigate a collection of information about the WM movement. http://meta.wikimedia.org/wiki/Wikimedia_Chapters_Association/Research Kind regards Ziko
Am Mittwoch, 27. Februar 2013 schrieb Maria Miteva :
Hi everyone,
I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, which introduces all possible source of data and makes it easy for a newbie to understand what suits his/her needs and how to get and work with the data. This is meant to be useful to the users ( which is you), so I have a few questions to help me make it better:
- I was wondering if any of you has used data from sources other than
the listed below and if yes, what? • XML dumps • the API • the Toolserver (or it's future replacement on WMF Labs) • our live IRC feeds • our raw hourly pageview data dumps (and the rudimentary API that you can use to query them atstats.grok.se) • the sources listed on our (experimental) open data registry on the DataHub http://datahub.io/group/wikimedia (includes DBpedia)
- Is there any specific information that you wished you had known
when you started using WMF data but is not documented online? 3. Do you have any datasets or tools for parsing/manipulating/visualizing data, which you think can be reused and you want to share? (Could be something you built or something you found and liked) 4. What information should be included about each source. I am thinking about : 1. description of the data - content, format , method of collection or how you can collect it, how often it is collected, for what period 2. skills required to get and work with the data ( PHP, SQL, etc.) 3. short sample 4. existing tools - for parsing, importing, etc. 5. maybe examples of projects where it was used?
Any other comments/suggestions will be appreciated.
Thank you in advance.
Mariya
Hi,
Jonas, that sound very similar. Tell us more about your prototype. What info are you going to display? Where are you going to host it? Is it going to be Portuguese? Are there any sources you have found that are not on the list in the email and the DataHub? If so, I would like to add them.
Ziko, thank you for the link. This links will definitely be useful for researchers especially to give them a basic understanding of the functioning of WMF. I believe it should be added as related info but once it progresses some.
Mariya
On Sun, Mar 3, 2013 at 8:06 PM, Ziko van Dijk zvandijk@gmail.com wrote:
Hello, Maybe this is related: on meta wiki we try to instigate a collection of information about the WM movement. http://meta.wikimedia.org/wiki/Wikimedia_Chapters_Association/Research Kind regards Ziko
Am Mittwoch, 27. Februar 2013 schrieb Maria Miteva :
Hi everyone,
I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, which introduces all possible source of data and makes it easy for a newbie to understand what suits his/her needs and how to get and work with the data. This is meant to be useful to the users ( which is you), so I have a few questions to help me make it better:
- I was wondering if any of you has used data from sources other
than the listed below and if yes, what? • XML dumps • the API • the Toolserver (or it's future replacement on WMF Labs) • our live IRC feeds • our raw hourly pageview data dumps (and the rudimentary API that you can use to query them atstats.grok.se) • the sources listed on our (experimental) open data registry on the DataHub http://datahub.io/group/wikimedia (includes DBpedia)
- Is there any specific information that you wished you had known
when you started using WMF data but is not documented online? 3. Do you have any datasets or tools for parsing/manipulating/visualizing data, which you think can be reused and you want to share? (Could be something you built or something you found and liked) 4. What information should be included about each source. I am thinking about : 1. description of the data - content, format , method of collection or how you can collect it, how often it is collected, for what period 2. skills required to get and work with the data ( PHP, SQL, etc.) 3. short sample 4. existing tools - for parsing, importing, etc. 5. maybe examples of projects where it was used?
Any other comments/suggestions will be appreciated.
Thank you in advance.
Mariya
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi, Maria! Then, answering your questions:
What info are you going to display? The idea is to combine visualizations and material of reference on how to manipulate dumps, use API, with a main focus on Portuguese community, and serving as an informative portal for hackers and general people. I still designing the first prototype and join reference material.
Where are you going to host it? Is it going to be Portuguese? Following the community desire, it should be hosted on Portuguese Wikipedia, with a branch/mirror on Meta. The first language may be Portuguese, but there's no impediment to add another one.
That is something I want ask you: where do you plan to host all these documentation? Is there any propose to include non-English content? I'd like to help, as I said before.
Are there any sources you have found that are not on the list in the email and the DataHub? No, I think these lists are well filled by now.
Jonas
Ziko, thank you for the link. This links will definitely be useful for researchers especially to give them a basic understanding of the functioning of WMF. I believe it should be added as related info but once it progresses some.
Mariya
On Sun, Mar 3, 2013 at 8:06 PM, Ziko van Dijk <zvandijk@gmail.com mailto:zvandijk@gmail.com> wrote:
Hello, Maybe this is related: on meta wiki we try to instigate a collection of information about the WM movement. http://meta.wikimedia.org/wiki/Wikimedia_Chapters_Association/Research Kind regards Ziko Am Mittwoch, 27. Februar 2013 schrieb Maria Miteva : Hi everyone, I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, which introduces all possible source of data and makes it easy for a newbie to understand what suits his/her needs and how to get and work with the data. This is meant to be useful to the users ( which is you), so I have a few questions to help me make it better: 1. I was wondering if any of you has used data from sources other than the listed below and if yes, what? • XML dumps • the API • the Toolserver (or it's future replacement on WMF Labs) • our live IRC feeds • our raw hourly pageview data dumps (and the rudimentary API that you can use to query them atstats.grok.se <http://stats.grok.se/>) • the sources listed on our (experimental) open data registry on the DataHub http://datahub.io/group/wikimedia (includes DBpedia) 2. Is there any specific information that you wished you had known when you started using WMF data but is not documented online? 3. Do you have any datasets or tools for parsing/manipulating/visualizing data, which you think can be reused and you want to share? (Could be something you built or something you found and liked) 4. What information should be included about each source. I am thinking about : 1. description of the data - content, format , method of collection or how you can collect it, how often it is collected, for what period 2. skills required to get and work with the data ( PHP, SQL, etc.) 3. short sample 4. existing tools - for parsing, importing, etc. 5. maybe examples of projects where it was used? Any other comments/suggestions will be appreciated. Thank you in advance. Mariya _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org