Hi all
I'm doing some work with colleagues from the education sector at UNESCO to look at improving some of the most viewed education articles on English language Wikipedia.
I'm trying to use TreeViews to get information on what are the most viewed articles in Category:Education, unfortunately such large categories just crash my browser, it means I will have to split the query up into at least 50-100 smaller queries.
Does anyone know of a less manual way around this? Ideally the output would be spreadsheet of the article title and the number of page views of the article for a 30, 60 or 90 period in the recent past. I will use Treeviews if it is the only way but I'd really love to save myself from half a day of data entry. I imagine this would also be useful for people working with other organisations for other subjects.
Thanks
John
Hi John,
Two comments: * Have you tried Wikipedia Tools for Google https://chrome.google.com/webstore/detail/wikipedia-tools/aiilcelhmpllcgkhhpifagfehbddkdfp?hl=en? It's a very neat add-on for Chrome, and in your case, the two functions WIKICATEGORYMEMBERS and WIKIPAGEVIEWS may help you get what you want.
* If you are looking for having a list of articles related to Education that are available in English and are missing in another language, you can use the article recommendation API. For example: http://recommend.wmflabs.org/api?s=en&t=fr&n=10&article=Educatio... gives you the top 10 recommendations for articles related to Education that are available in English but missing in French. Note that "related" is not the same as articles that are in category "Education" though I hope we can accommodate categories in the future. The documentation for the API is in here https://github.com/ewulczyn/translation-recs-app/tree/master/api.
Hope this helps.
Best, Leila
Leila Zia Research Scientist Wikimedia Foundation
On Thu, Apr 21, 2016 at 5:04 AM, john cummings mrjohncummings@gmail.com wrote:
Hi all
I'm doing some work with colleagues from the education sector at UNESCO to look at improving some of the most viewed education articles on English language Wikipedia.
I'm trying to use TreeViews to get information on what are the most viewed articles in Category:Education, unfortunately such large categories just crash my browser, it means I will have to split the query up into at least 50-100 smaller queries.
Does anyone know of a less manual way around this? Ideally the output would be spreadsheet of the article title and the number of page views of the article for a 30, 60 or 90 period in the recent past. I will use Treeviews if it is the only way but I'd really love to save myself from half a day of data entry. I imagine this would also be useful for people working with other organisations for other subjects.
Thanks
John
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Leila
Thanks very much, what I need to be able to do is get all the articles within the category and subcategories of Category:Education and then get page views for all of them, its a lot of pages...... My friend Ed Saperia created a spreadsheet to do this but unfortunately the query API limits to a few 100 articles so its not possible to run the query through that.
Any other suggestions would be very much appreciated.
Thanks
John
On 21 April 2016 at 18:54, Leila Zia leila@wikimedia.org wrote:
Hi John,
Two comments:
- Have you tried Wikipedia Tools for Google
https://chrome.google.com/webstore/detail/wikipedia-tools/aiilcelhmpllcgkhhpifagfehbddkdfp?hl=en? It's a very neat add-on for Chrome, and in your case, the two functions WIKICATEGORYMEMBERS and WIKIPAGEVIEWS may help you get what you want.
- If you are looking for having a list of articles related to Education
that are available in English and are missing in another language, you can use the article recommendation API. For example: http://recommend.wmflabs.org/api?s=en&t=fr&n=10&article=Educatio... gives you the top 10 recommendations for articles related to Education that are available in English but missing in French. Note that "related" is not the same as articles that are in category "Education" though I hope we can accommodate categories in the future. The documentation for the API is in here https://github.com/ewulczyn/translation-recs-app/tree/master/api.
Hope this helps.
Best, Leila
Leila Zia Research Scientist Wikimedia Foundation
On Thu, Apr 21, 2016 at 5:04 AM, john cummings mrjohncummings@gmail.com wrote:
Hi all
I'm doing some work with colleagues from the education sector at UNESCO to look at improving some of the most viewed education articles on English language Wikipedia.
I'm trying to use TreeViews to get information on what are the most viewed articles in Category:Education, unfortunately such large categories just crash my browser, it means I will have to split the query up into at least 50-100 smaller queries.
Does anyone know of a less manual way around this? Ideally the output would be spreadsheet of the article title and the number of page views of the article for a 30, 60 or 90 period in the recent past. I will use Treeviews if it is the only way but I'd really love to save myself from half a day of data entry. I imagine this would also be useful for people working with other organisations for other subjects.
Thanks
John
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
John, I played with Wikipedia Tools for Google and I'm sure it will do what you're looking for. Check out this https://docs.google.com/spreadsheets/d/1HeFluqXXcSXw14pk_hceKbuxykNaTjOJMLrNxs81Ifk/edit#gid=0 Google spreadsheet. You just have to repeat a slightly modified formula in columns B and C to get what you have in column D for all subcategories of Education listed in A. You can automate that part, too.
L
On Thu, Apr 21, 2016 at 12:39 PM, john cummings mrjohncummings@gmail.com wrote:
Hi Leila
Thanks very much, what I need to be able to do is get all the articles within the category and subcategories of Category:Education and then get page views for all of them, its a lot of pages...... My friend Ed Saperia created a spreadsheet to do this but unfortunately the query API limits to a few 100 articles so its not possible to run the query through that.
Any other suggestions would be very much appreciated.
Thanks
John
On 21 April 2016 at 18:54, Leila Zia leila@wikimedia.org wrote:
Hi John,
Two comments:
- Have you tried Wikipedia Tools for Google
https://chrome.google.com/webstore/detail/wikipedia-tools/aiilcelhmpllcgkhhpifagfehbddkdfp?hl=en? It's a very neat add-on for Chrome, and in your case, the two functions WIKICATEGORYMEMBERS and WIKIPAGEVIEWS may help you get what you want.
- If you are looking for having a list of articles related to Education
that are available in English and are missing in another language, you can use the article recommendation API. For example: http://recommend.wmflabs.org/api?s=en&t=fr&n=10&article=Educatio... gives you the top 10 recommendations for articles related to Education that are available in English but missing in French. Note that "related" is not the same as articles that are in category "Education" though I hope we can accommodate categories in the future. The documentation for the API is in here https://github.com/ewulczyn/translation-recs-app/tree/master/api.
Hope this helps.
Best, Leila
Leila Zia Research Scientist Wikimedia Foundation
On Thu, Apr 21, 2016 at 5:04 AM, john cummings mrjohncummings@gmail.com wrote:
Hi all
I'm doing some work with colleagues from the education sector at UNESCO to look at improving some of the most viewed education articles on English language Wikipedia.
I'm trying to use TreeViews to get information on what are the most viewed articles in Category:Education, unfortunately such large categories just crash my browser, it means I will have to split the query up into at least 50-100 smaller queries.
Does anyone know of a less manual way around this? Ideally the output would be spreadsheet of the article title and the number of page views of the article for a 30, 60 or 90 period in the recent past. I will use Treeviews if it is the only way but I'd really love to save myself from half a day of data entry. I imagine this would also be useful for people working with other organisations for other subjects.
Thanks
John
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Here are all 96610 subcategories of Education, with 2.6 million articles.
The problem is sometimes one unexpected subcategory can draw in lots of unexpected content, and the most viewed article can thus be totally off-topic.
I could do some iterations and prune the tree into something more manageable, by blacklisting weird subbranches.
https://stats.wikimedia.org/wikimedia/pageviews/categorized/wp-en/2016-02/ca...
Erik Zachte
From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Leila Zia Sent: Thursday, April 21, 2016 23:13 To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Finding the most viewed Wikipedia articles on education
John, I played with Wikipedia Tools for Google and I'm sure it will do what you're looking for. Check out this https://docs.google.com/spreadsheets/d/1HeFluqXXcSXw14pk_hceKbuxykNaTjOJMLrNxs81Ifk/edit#gid=0 Google spreadsheet. You just have to repeat a slightly modified formula in columns B and C to get what you have in column D for all subcategories of Education listed in A. You can automate that part, too.
L
On Thu, Apr 21, 2016 at 12:39 PM, john cummings mrjohncummings@gmail.com wrote:
Hi Leila
Thanks very much, what I need to be able to do is get all the articles within the category and subcategories of Category:Education and then get page views for all of them, its a lot of pages...... My friend Ed Saperia created a spreadsheet to do this but unfortunately the query API limits to a few 100 articles so its not possible to run the query through that.
Any other suggestions would be very much appreciated.
Thanks
John
On 21 April 2016 at 18:54, Leila Zia leila@wikimedia.org wrote:
Hi John,
Two comments:
* Have you tried Wikipedia Tools for Google https://chrome.google.com/webstore/detail/wikipedia-tools/aiilcelhmpllcgkhhpifagfehbddkdfp?hl=en ? It's a very neat add-on for Chrome, and in your case, the two functions WIKICATEGORYMEMBERS and WIKIPAGEVIEWS may help you get what you want.
* If you are looking for having a list of articles related to Education that are available in English and are missing in another language, you can use the article recommendation API. For example: http://recommend.wmflabs.org/api?s=en http://recommend.wmflabs.org/api?s=en&t=fr&n=10&article=Education &t=fr&n=10&article=Education gives you the top 10 recommendations for articles related to Education that are available in English but missing in French. Note that "related" is not the same as articles that are in category "Education" though I hope we can accommodate categories in the future. The documentation for the API is in here https://github.com/ewulczyn/translation-recs-app/tree/master/api .
Hope this helps.
Best,
Leila
Leila Zia
Research Scientist
Wikimedia Foundation
On Thu, Apr 21, 2016 at 5:04 AM, john cummings mrjohncummings@gmail.com wrote:
Hi all
I'm doing some work with colleagues from the education sector at UNESCO to look at improving some of the most viewed education articles on English language Wikipedia.
I'm trying to use TreeViews to get information on what are the most viewed articles in Category:Education, unfortunately such large categories just crash my browser, it means I will have to split the query up into at least 50-100 smaller queries.
Does anyone know of a less manual way around this? Ideally the output would be spreadsheet of the article title and the number of page views of the article for a 30, 60 or 90 period in the recent past. I will use Treeviews if it is the only way but I'd really love to save myself from half a day of data entry. I imagine this would also be useful for people working with other organisations for other subjects.
Thanks
John
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Erik, thanks so much, I wondered why Treeviews was grinding to such a hault :)
Looking at the category trees do you think there is a reasonable subcategory depth I could go to to get a fairly complete overview without you having to prune? If not then yes please if you could prune that would be great :)
Thanks again
John
On 21 April 2016 at 23:34, Erik Zachte ezachte@wikimedia.org wrote:
Here are all 96610 subcategories of Education, with 2.6 million articles.
The problem is sometimes one unexpected subcategory can draw in lots of unexpected content, and the most viewed article can thus be totally off-topic.
I could do some iterations and prune the tree into something more manageable, by blacklisting weird subbranches.
https://stats.wikimedia.org/wikimedia/pageviews/categorized/wp-en/2016-02/ca...
Erik Zachte
*From:* Wiki-research-l [mailto: wiki-research-l-bounces@lists.wikimedia.org] *On Behalf Of *Leila Zia *Sent:* Thursday, April 21, 2016 23:13 *To:* Research into Wikimedia content and communities *Subject:* Re: [Wiki-research-l] Finding the most viewed Wikipedia articles on education
John, I played with Wikipedia Tools for Google and I'm sure it will do what you're looking for. Check out this https://docs.google.com/spreadsheets/d/1HeFluqXXcSXw14pk_hceKbuxykNaTjOJMLrNxs81Ifk/edit#gid=0 Google spreadsheet. You just have to repeat a slightly modified formula in columns B and C to get what you have in column D for all subcategories of Education listed in A. You can automate that part, too.
L
On Thu, Apr 21, 2016 at 12:39 PM, john cummings mrjohncummings@gmail.com wrote:
Hi Leila
Thanks very much, what I need to be able to do is get all the articles within the category and subcategories of Category:Education and then get page views for all of them, its a lot of pages...... My friend Ed Saperia created a spreadsheet to do this but unfortunately the query API limits to a few 100 articles so its not possible to run the query through that.
Any other suggestions would be very much appreciated.
Thanks
John
On 21 April 2016 at 18:54, Leila Zia leila@wikimedia.org wrote:
Hi John,
Two comments:
- Have you tried Wikipedia Tools for Google
https://chrome.google.com/webstore/detail/wikipedia-tools/aiilcelhmpllcgkhhpifagfehbddkdfp?hl=en? It's a very neat add-on for Chrome, and in your case, the two functions WIKICATEGORYMEMBERS and WIKIPAGEVIEWS may help you get what you want.
- If you are looking for having a list of articles related to Education
that are available in English and are missing in another language, you can use the article recommendation API. For example: http://recommend.wmflabs.org/api?s=en&t=fr&n=10&article=Educatio... gives you the top 10 recommendations for articles related to Education that are available in English but missing in French. Note that "related" is not the same as articles that are in category "Education" though I hope we can accommodate categories in the future. The documentation for the API is in here https://github.com/ewulczyn/translation-recs-app/tree/master/api.
Hope this helps.
Best,
Leila
Leila Zia
Research Scientist
Wikimedia Foundation
On Thu, Apr 21, 2016 at 5:04 AM, john cummings mrjohncummings@gmail.com wrote:
Hi all
I'm doing some work with colleagues from the education sector at UNESCO to look at improving some of the most viewed education articles on English language Wikipedia.
I'm trying to use TreeViews to get information on what are the most viewed articles in Category:Education, unfortunately such large categories just crash my browser, it means I will have to split the query up into at least 50-100 smaller queries.
Does anyone know of a less manual way around this? Ideally the output would be spreadsheet of the article title and the number of page views of the article for a 30, 60 or 90 period in the recent past. I will use Treeviews if it is the only way but I'd really love to save myself from half a day of data entry. I imagine this would also be useful for people working with other organisations for other subjects.
Thanks
John
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org