On Mon, Jul 9, 2018 at 1:59 PM, Amir E. Aharoni <amir.aharoni@mail.huji.ac.il> wrote:

Thank you so much! In many countries it's

A couple of questions:
1. Are any of the results of this query private? Or can I talk about them to people?
2. Is anything like this already published anywhere? If it isn't, it may be nice to publish such a thing, similarly to Google Zeitgeist.

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬

2018-07-09 13:19 GMT+03:00 Francisco Dans <fdans@wikimedia.org>:
Hi Amir,

As Tilman has suggested, your best bet is to query the pageview_hourly table. I was going to be lazy and give you a query to just find out the most viewed article for a given country, but then I made a few experiments and this is the query I came up with to generate a list of countries and their respective most viewed articles and view counts. It takes a few minutes to run for a single day, so I'm sure someone here could suggest a better approach.

WITH articles_countries AS (
SELECT country, page_title, sum(view_count) AS views
FROM pageview_hourly
WHERE year=2018 AND month=3 AND day=15
GROUP BY country, page_title
)
SELECT s.country as country, s.page_title as page_title, s.views as views
FROM (
SELECT max(named_struct('views', views, 'country', country, 'page_title', page_title)) as s from articles_countries group by country
) t;

Cheers / see you in ZA,
Fran

On Mon, Jul 9, 2018 at 10:18 AM, Amir E. Aharoni <amir.aharoni@mail.huji.ac.il> wrote:
Hi,

Is there a way to find what are the most popular articles per country?

Finding the most popular articles per language is easy with the Pageviews tool, but languages and countries are of course not the same.

One thing I tried is going to Turnilo, webrequest_sampled_128, and filtering by country. But here it gets troublesome:
* Splitting can be done by Uri host, which is *more or less* the project, or by Uri path, which is *more or less* the article (but see below), and I couldn't find a convenient way to combine them.
* Mobile (.m.) and desktop hosts are separate. It may actually sometimes be useful to see differences (or lack thereof) between desktop and mobile, but combining them is often useful, too. This can probably be done with regular expressions, but this brings us to the biggest problem:
* Filtering by Uri path would be useful if it didn't have so many paths for images, beacons, etc. Filtering using the regular expression "\/wiki\/.+" may be the right thing functionally, but in practice it's very slow or doesn't work at all.
* I don't know what exactly is logged in webrequest_sampled_128, but the name hints that it doesn't include everything. A sample may be OK for countries with a lot of traffic like U.S. or Spain, but for countries with smaller traffic this may start being a problem.

Any better ideas?

Thanks!

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

--
Francisco Dans
Software Engineer, Analytics Team
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Francisco Dans

Software Engineer, Analytics Team

Wikimedia Foundation