Hi all,

Ismael, thanks so much for reaching out about this. Unfortunately, I think Dan is right when he says that the granularity of data carries a big privacy cost. We're working hard to try and lower the threshold of daily unique visitors by country in order to be released from 1000 to 90, but it seems like these Wikivoyage itineraries are likely to have less than 90 daily unique visitors in most countries. Either way, we're hoping to start releasing daily pageviews by country in January, so you should check to see if your pages are in the dataset once that release is live.

If you want unfettered access to the data (split up by country), you should pursue a research partnership. Besides that, you can likely use some existing tools (like the Pageviews API or pageviews.wmcloud.org) to get a sense of the data. I'll be sure to reach back out once the differentially-private data is released so that you might be able to check on the relevant pages!

Thanks again for reaching out :)

Hal

On Wed, Dec 21, 2022 at 12:07 PM Dan Andreescu <dandreescu@wikimedia.org> wrote:
The only way is to help with the ongoing (and complex) differential privacy work

I have systems background but probably this could be outside my skills. How could I help?

Hm, it's some tricky programming work, I'm not 100% sure of the latest status or opportunities to get involved, but I'm cc-ing Hal Triedman to see if he has thoughts. (Hal see archive)


 If you are indeed interested in pageviews, the definition you linked to talks about the data internally available.

Oh!
 
  Can I ask you to elaborate a bit more on why you need per-country data?

Well, First I've been looking for the most useful tools and sources available (and found very interesting many of them[1]). Second, in this precise case we are running a pilot project in which has been published some academic project results as Wikivoyage itineraries (3 in EN and 3 in ES). These are the articles we are interested in tracking now. 

About the rationale, one of the bigger drivers nowadays is the well known link between heritage, tourism and sustainability (example: the Sustainable Development Goals), so there is a trend to better analyze this context to study and plan. Usually touristic destinations have very well defined countries of origin. The best you know the origin, the best you can plan. Also there should be another positive impact in Wikimedia: new incentives for institutions to create or translate articles to the relevant languages. Always restrited to the heritage domain.  Here in Spain tourism is one of the main economic sectors and anything providing intelligence would help for better planning and conservation. 

Also, we have identified a new potential activity area about doing intelligence analysis of trends in heritage (interest of the public, changes in institutional focuses, new relevant practices, etc), not only about the Spanish one but worldwide. This is also an scientific institution and would find it very useful to collect the most precise traces available (with absolute respect to the users privacy) to look for signals they could use to refocus/prioritize their institutional goals.

So, this is it.


This is indeed a very interesting use case and a chance for this data to be very helpful.  Unfortunately to my naive eyes, this granularity of data also carries a big privacy cost.  The only way to get to it would be a research collaboration, but there are *lots* of requests for those and not enough researchers to help facilitate.  I'm honestly not sure there's an easy way around this... but I'll keep thinking about it and I know it'll be useful for Hal to see this kind of request and add it to his back burner.  Thanks for detailing!