I love this. The Nigeria data (23M views for *Search* pages, 5M for *main pages*, 7M for the entire rest of the list!) is a reminder of how important those are to readers + perhaps how high bounce rates are... Small improvements there make huge improvements to site experience :) Maybe also improvements to how easily people can get the right search result without being taken to a special:search page! It would be great to see a *country* facet on topviews https://pageviews.wmcloud.org/topviews/?project=de.wikipedia.org&platform=all-access&date=2022-05&excludes= .
On Thu, Dec 8, 2022 at 5:53 PM Hal Triedman htriedman@wikimedia.org wrote:
Hi all!
Looks like Isaac and I had the same thought here. I also spent ~45 minutes hacking together a script that collects the top (up to) 500 pages for a given country from 1 December 2021 through 30 November 2022 using the WMF pageviews API https://wikimedia.org/api/rest_v1/#/Pageviews%20data. All of the datasets are relatively small and available for download and free use https://analytics.wikimedia.org/published/datasets/most_visited_articles_12.2021-11.2022/. Code for generating these lists is available on the WMF gitlab instance https://gitlab.wikimedia.org/htriedman/annual-top-pages, and runs in ~3.5 hours on a normal Macbook, if anyone wants to download/fork it and try it on their own.
There are only 135 ISO codes included in this set of files — I removed codes that WMF doesn't release data about or that have no data reported for the 365 day period in question. Let me know if you have any questions, and hope this helps!
Hal
On Thu, Dec 8, 2022 at 8:18 AM Isaac Johnson isaac@wikimedia.org wrote:
Romaine, Building on Chico's comment, I put together an example notebook of how to estimate such a list from the public data in case you're curious (I calculated it for January-November for Nigeria in the example). It's not a perfect approach in that it makes some assumptions and uses incomplete data but probably is close to what the actual list would be (details in the link). You'd likely want to use your knowledge of the region/languages to filter out pages like Special:Search and bot-driven views that slipped through into the data (like Cookie and Cleopatra in the example below).
Notebook: https://public.paws.wmcloud.org/User:Isaac_(WMF)/Top_Read_2022_Geo.ipynb#Exa...)
It makes use of these public Wikimedia resources:
- PAWS infrastructure: https://wikitech.wikimedia.org/wiki/PAWS
- Pageviews API:
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
- Python mwviews library for interacting with the pageviews API:
https://github.com/mediawiki-utilities/python-mwviews
You can read instructions for how to copy this notebook and run it for other countries here: https://wikitech.wikimedia.org/wiki/PAWS/Getting_started_with_PAWS#Fork
Best, Isaac
Copying the top-100 output for Nigeria below for ease of access:
article views 1 https://en.wikipedia.org/wiki/Special:Search 13696500 2 https://fr.wikipedia.org/wiki/Cookie_(informatique) 10754500 3 https://ig.wikipedia.org/wiki/Special:Search 7579900 4 https://en.wikipedia.org/wiki/Main_Page 5502800 5 https://ig.wikipedia.org/wiki/Ih%C3%BC_k%C3%A1r%C3%ADr%C3%AD:Search 1791900 6 https://foundation.wikimedia.org/wiki/Privacy_policy 870000 7 https://en.wikipedia.org/wiki/Bet9ja 664700 8 https://foundation.wikimedia.org/wiki/Terms_of_Use 646900 9 https://en.wikipedia.org/wiki/XXX 624200 10 https://en.wikipedia.org/wiki/Nigeria 491700 11 https://en.wikipedia.org/wiki/Cleopatra 429900 12 https://en.wikipedia.org/wiki/Elizabeth_II 328400 13 https://en.wikipedia.org/wiki/Bola_Tinubu 320300 14 https://en.wikipedia.org/wiki/XXX_(film_series) 234700 15 https://commons.wikimedia.org/wiki/Commons:Wiki_Loves_Africa_2022/en 230600 16 https://en.wikipedia.org/wiki/Peter_Obi 229000 17 https://fr.wikipedia.org/wiki/Enoch_Adeboye 197300 18 https://commons.wikimedia.org/wiki/Commons:Wiki_Loves_Earth_2022_in_Nigeria 154600 19 https://en.wikipedia.org/wiki/XXX:_Return_of_Xander_Cage 143000 20 https://en.wikipedia.org/wiki/Vladimir_Putin 131100 21 https://en.wikipedia.org/wiki/Russo-Ukrainian_War 122700 22 https://en.wikipedia.org/wiki/XXXX_(beer) 116800 23 https://en.wikipedia.org/wiki/Charles_III 114600 24 https://en.wikipedia.org/wiki/Africa_Cup_of_Nations 112300 25 https://en.wikipedia.org/wiki/Jeffrey_Dahmer 110300 26 https://en.wikipedia.org/wiki/Yusuf_Datti_Baba-Ahmed 108700 27 https://en.wikipedia.org/wiki/Cristiano_Ronaldo 106300 28 https://commons.wikimedia.org/wiki/Commons:Wiki_Loves_Earth_2022_in_South_We... 99700 29 https://en.wikipedia.org/wiki/Atiku_Abubakar 91800 30 https://en.wikipedia.org/wiki/2022_FIFA_World_Cup 91300 31 https://en.wikipedia.org/wiki/NATO 86300 32 https://en.wikipedia.org/wiki/Erling_Haaland 84800 33 https://en.wikipedia.org/wiki/Russia%E2%80%93Ukraine_relations 84300 34 https://en.wikipedia.org/wiki/2021_Africa_Cup_of_Nations 83300 35 https://en.wikipedia.org/wiki/Diana,_Princess_of_Wales 81900 36 https://en.wikipedia.org/wiki/Black_Adam_(film) 80600 37 https://en.wikipedia.org/wiki/Black_Panther:_Wakanda_Forever 66800 38 https://en.wikipedia.org/wiki/Ademola_Adeleke 66600 39 https://en.wikipedia.org/wiki/Ukraine 65100 40 https://en.wikipedia.org/wiki/Rishi_Sunak 60700 41 https://en.wikipedia.org/wiki/Elon_Musk 60200 42 https://en.wikipedia.org/wiki/Takeoff_(rapper) 58000 43 https://en.wikipedia.org/wiki/House_of_the_Dragon 57500 44 https://en.wikipedia.org/wiki/Casemiro 56800 45 https://en.wikipedia.org/wiki/Prince_Philip,_Duke_of_Edinburgh 56500 46 https://en.wikipedia.org/wiki/Member_states_of_NATO 56300 47 https://en.wikipedia.org/wiki/Tobi_Amusan 54000 48 https://en.wikipedia.org/wiki/George_VI 53400 49 https://en.wikipedia.org/wiki/2022_Kenyan_general_election 53100 50 https://en.wikipedia.org/wiki/2022_Russian_invasion_of_Ukraine 52900 51 https://en.wikipedia.org/wiki/Kashim_Shettima 52400 52 https://en.wikipedia.org/wiki/File:WhatsApp.svg 51000 53 https://wikimania.wikimedia.org/wiki/Registration 48200 54 https://en.wikipedia.org/wiki/Prince_Harry,_Duke_of_Sussex 47700 55 https://en.wikipedia.org/wiki/The_Woman_King 45900 56 https://en.wikipedia.org/wiki/Graham_Potter 44900 57 https://en.wikipedia.org/wiki/Pierre-Emerick_Aubameyang 43900 58 https://en.wikipedia.org/wiki/Antony_(footballer,_born_2000) 43700 59 https://en.wikipedia.org/wiki/Ada_Ameh 43500 60 https://en.wikipedia.org/wiki/Vincent_Aboubakar 43100 61 https://en.wikipedia.org/wiki/Russia 42300 62 https://en.wikipedia.org/wiki/Lionel_Messi 42300 63 https://en.wikipedia.org/wiki/Moses_Simon 41800 64 https://en.wikipedia.org/wiki/Karim_Benzema 41000 65 https://en.wikipedia.org/wiki/List_of_political_parties_in_Nigeria 40500 66 https://en.wikipedia.org/wiki/History_of_Nigeria 40200 67 https://en.wikipedia.org/wiki/Bianca_Odumegwu-Ojukwu 39400 68 https://en.wikipedia.org/wiki/Ahmad_Lawan 39400 69 https://en.wikipedia.org/wiki/Doctor_Strange_in_the_Multiverse_of_Madness 39300 70 https://en.wikipedia.org/wiki/2022_Women%27s_Africa_Cup_of_Nations 38500 71 https://commons.wikimedia.org/wiki/Commons:Wiki_Loves_Monuments_2022_in_Nige... 37700 72 https://en.wikipedia.org/wiki/List_of_capitals_of_states_of_Nigeria 36500 73 https://en.wikipedia.org/wiki/Valentine%27s_Day 36200 74 https://en.wikipedia.org/wiki/Liz_Truss 35200 75 https://en.wikipedia.org/wiki/Maduka_Okoye 34800 76 https://en.wikipedia.org/wiki/Soviet_Union 34600 77 https://en.wikipedia.org/wiki/Raheem_Sterling 33700 78 https://en.wikipedia.org/wiki/Roman_Abramovich 32400 79 https://en.wikipedia.org/wiki/Anne,_Princess_Royal 32300 80 https://en.wikipedia.org/wiki/Edward_VIII 32000 81 https://en.wikipedia.org/wiki/William,_Prince_of_Wales 32000 82 https://en.wikipedia.org/wiki/%C3%8Cy%C3%A1l%27%E1%BB%8D%CC%81j%C3%A0 31900 83 https://en.wikipedia.org/wiki/Simon_Leviev 31300 84 https://en.wikipedia.org/wiki/Alchemy_of_Souls 31100 85 https://en.wikipedia.org/wiki/Volodymyr_Zelenskyy 29300 86 https://en.wikipedia.org/wiki/List_of_state_governors_of_Nigeria 28700 87 https://en.wikipedia.org/wiki/Isiaka_Adeleke 28500 88 https://en.wikipedia.org/wiki/The_Headies_2022 28300 89 https://en.wikipedia.org/wiki/Lisandro_Mart%C3%ADnez 28200 90 https://en.wikipedia.org/wiki/XXX:_State_of_the_Union 27100 91 https://en.wikipedia.org/wiki/Independence_Day_(Nigeria) 27000 92 https://en.wikipedia.org/wiki/Women%27s_Africa_Cup_of_Nations 26800 93 https://en.wikipedia.org/wiki/Big_Brother_Naija_(season_7) 26800 94 https://en.wikipedia.org/wiki/Marc_Cucurella 26500 95 https://en.wikipedia.org/wiki/FIFA_World_Cup 26300 96 https://en.wikipedia.org/wiki/List_of_Nigerian_Grammy_Award_winners_and_nomi... 26100 97 https://en.wikipedia.org/wiki/Qatar 26100 98 https://meta.wikimedia.org/wiki/International_Museum_Day_2022 25800 99 https://en.wikipedia.org/wiki/Joyce_Vincent 25700 100 https://en.wikipedia.org/wiki/An%C3%ADk%C3%BAl%C3%A1p%C3%B3_(2022_film) 25300
On Wed, Dec 7, 2022 at 7:41 PM Romaine Wiki romaine.wiki@gmail.com wrote:
For some languages it is easy as a particular language is spoken in one country mainly. (Still there might be some local languages/dialects that are then not represented in the data.)
For some other languages is is not easy to get the statistics of the most visited pages of a country as the language is spoken in multiple countries.
If for example one country only has 3% of the population in comparison to another country with the same language, the language statistics are very biased. The larger country consumes so much data, that the data of the country with the smaller population is invisible. If we have no data for them, we let those unseen communities down.
Romaine
Op wo 7 dec. 2022 om 18:21 schreef Jan Ainali ainali.jan@gmail.com:
On Swedish Wikipedia we collect it on one page: https://sv.wikipedia.org/wiki/Wikipedia:Mest_visade_artiklar_2022
Doing it per language is much easier than per country, as the data is publicly available.
Best, Jan Ainali
Den ons 7 dec. 2022 kl 16:36 skrev Romaine Wiki <romaine.wiki@gmail.com
:
Every year it reaches the headlines of the news: the top 10 or top 100 of most visited Google searches of the past year in my country. This I have seen in some other countries too.
People are interested and with making this data public, something positive is said about Google (besides all the negatieve news about them during the rest of the year).
This is something simple Wikimedia could do too: sharing this kind of data (*by country*) with the world. It would bring Wikipedia closer to the public, more positive awareness. Or otherwise making this data available to the local chapters so they can bring positive news about Wikipedia.
Romaine
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
-- Isaac Johnson (he/him/his) -- Senior Research Scientist -- Wikimedia Foundation _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org