Any idea why the most popular article in India is "-"? CCing Dan Garry of Discovery team.

On Fri, Jan 22, 2016 at 5:13 PM, Tilman Bayer <tbayer@wikimedia.org> wrote:
Below is an example Hive query yielding the 50 most viewed pages in
India during December 2015. It took less than 10 minutes of wall clock
time to complete.

SELECT CONCAT('https://',project,'.org/wiki/',page_title),
SUM(view_count) AS views
FROM wmf.pageview_hourly
WHERE
   year = 2015
   AND month = 12
   AND country = "India"
   AND agent_type = "user"
GROUP BY project, page_title
ORDER BY views DESC LIMIT 50;

...
Total MapReduce CPU Time Spent: 0 days 19 hours 13 minutes 2 seconds 930 msec
OK
_c0 views
https://en.wikipedia.org/wiki/Main_Page 43515253
https://en.wikipedia.org/wiki/Special:Search 4818687
https://en.wikipedia.org/wiki/- 2650346
https://en.wikipedia.org/wiki/Bajirao_I 1414810
https://en.wikipedia.org/wiki/Dilwale_(2015_film) 1410015
https://en.wikipedia.org/wiki/Mastani 1232964
https://en.wikipedia.org/wiki/Bajirao_Mastani_(film) 1133261
https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2015 632890
https://en.wikipedia.org/wiki/Hate_Story_3 582816
https://en.wikipedia.org/wiki/Special:MobileMenu 499379
https://en.wikipedia.org/wiki/Star_Wars:_The_Force_Awakens 438113
https://en.wikipedia.org/wiki/Tamasha_(film) 390519
https://en.wikipedia.org/wiki/Prem_Ratan_Dhan_Payo 378133
https://en.wikipedia.org/wiki/India 368946
https://en.wikipedia.org/wiki/List_of_Bollywood_films_of_2016 335547
https://en.wikipedia.org/wiki/Star_Wars 334326
https://en.wikipedia.org/wiki/Sunny_Leone 333848
https://en.wikipedia.org/wiki/Sundar_Pichai 329264
https://en.wikipedia.org/wiki/Special:Book 324255
https://en.wikipedia.org/wiki/List_of_highest-grossing_Bollywood_films 321418
https://en.wikipedia.org/wiki/Salman_Khan 309113
https://en.wikipedia.org/wiki/'Tis_the_Season 308221
https://en.wikipedia.org/wiki/Mandana_Karimi 289662
https://en.wikipedia.org/wiki/Kyaa_Kool_Hain_Hum_3 281801
https://en.wikipedia.org/wiki/Kashibai 272673
https://en.wikipedia.org/wiki/Bigg_Boss_9 272203
https://en.wikipedia.org/wiki/Kriti_Sanon 266773
https://en.wikipedia.org/wiki/2012_Delhi_gang_rape 265296
https://en.wikipedia.org/wiki/Shah_Rukh_Khan 263729
https://en.wikipedia.org/wiki/Neerja_Bhanot 259410
https://en.wikipedia.org/wiki/Nora_Fatehi 252085
https://en.wikipedia.org/wiki/Ashoka 250255
https://en.wikipedia.org/wiki/B._K._S._Iyengar 248422
https://en.wikipedia.org/wiki/2015_South_Indian_floods 246377
https://en.wikipedia.org/wiki/Baahubali:_The_Beginning 244281
https://en.wikipedia.org/wiki/Shamsher_Bahadur_I_(Krishna_Rao) 232122
https://en.wikipedia.org/wiki/Christmas 228278
https://en.wikipedia.org/wiki/Thanga_Magan_(2015_film) 222373
https://en.wikipedia.org/wiki/Ranveer_Singh 221010
https://en.wikipedia.org/wiki/A._P._J._Abdul_Kalam 220612
https://en.wikipedia.org/wiki/Shivaji 218245
https://en.wikipedia.org/wiki/Deepika_Padukone 218242
https://en.wikipedia.org/wiki/TLC:_Tables,_Ladders_and_Chairs_(2015) 211920
https://en.wikipedia.org/wiki/Gizele_Thakral 206585
https://en.wikipedia.org/wiki/Urvashi_Rautela 204305
https://en.wikipedia.org/wiki/Peshwa 194957
https://en.wikipedia.org/wiki/Kajol 192044
https://hi.wikipedia.org/wiki/मुखपृष्ठ 184274
https://en.wikipedia.org/wiki/Quantico_(TV_series) 183112
https://en.wikipedia.org/wiki/Mahatma_Gandhi 182336
Time taken: 562.621 seconds, Fetched: 50 row(s)


See also the discussion at https://phabricator.wikimedia.org/T120113
(As mentioned there, a while ago I retrieved the global top 200 pages
for a timespan of almost six months, with some wait time but no major
issues. It's not quite clear to me why the "brute force" approach
mentioned in the ticket failed, but I guess it had to do with the
difficulty of repeating such a query for all projects - or countries -
to generate top lists for every one of them.)

On Wed, Jan 20, 2016 at 12:42 PM, Kevin Leduc <kevin@wikimedia.org> wrote:
> +Analytics list so they can comment.
>
> I don't have such a script.  It's a pretty intensive job to compile top
> articles especially over a month.  The pageview API was supposed to have top
> articles per month per wiki but the job is so massive that it failed to run
> in Hive.  Analytics knows there are better algorithms out there to solve
> this problem.  So the pageview API just has top per day per wiki.
>
> I imagine that you are looking at some very specific wikis and countries...
> not all of them.  Maybe someone on the list can make an example hive script
> (given a wiki and country) that gives the top for a day.
>
>
> On Wed, Jan 20, 2016 at 12:23 PM, Dan Foy <dfoy@wikimedia.org> wrote:
>>
>> Hi Kevin,
>>
>> In your collection of scripts for Hive, do you have one that can act as a
>> starting point for me to get the top N articles / URLs for Wikipedia in a
>> country?
>>
>> Thanks,
>> Dan
>>
>>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics