Hi all,
Ismael, thanks so much for reaching out about this. Unfortunately, I think
Dan is right when he says that the granularity of data carries a big
privacy cost. We're working hard to try and lower the threshold of daily
unique visitors by country in order to be released from 1000 to 90, but it
seems like these Wikivoyage itineraries are likely to have less than 90
daily unique visitors in most countries. Either way, we're hoping to start
releasing daily pageviews by country in January, so you should check to see
if your pages are in the dataset once that release is live.
If you want unfettered access to the data (split up by country), you should
pursue a research partnership. Besides that, you can likely use some
existing tools (like the Pageviews API
<https://wikimedia.org/api/rest_v1/#/Pageviews%20data/get_metrics_pageviews_top__project___access___year___month___day_>
or
) to get a sense of the data. I'll be sure to reach
back out once the differentially-private data is released so that you might
be able to check on the relevant pages!
Thanks again for reaching out :)
Hal
On Wed, Dec 21, 2022 at 12:07 PM Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
The only way is to help with the ongoing (and complex)
differential
privacy
work <https://phabricator.wikimedia.org/T307245>
I have systems background but probably this could be outside my skills.
How could I help?
Hm, it's some tricky programming work, I'm not 100% sure of the latest
status or opportunities to get involved, but I'm cc-ing Hal Triedman to see
if he has thoughts. (Hal see archive
<https://lists.wikimedia.org/hyperkitty/list/analytics@lists.wikimedia.org/thread/IKL3WOQ2UY7IMMCUTV7EYGT6PFVFLVCA/>
)
[1]
https://meta.wikimedia.org/wiki/Research:Page_view#Resulting_format
If you are indeed interested in pageviews, the definition you linked to
talks about the data internally available.
Oh!
Can I ask you to elaborate a bit more on why
you need per-country data?
Well, First I've been looking for the most useful tools and sources
available (and found very interesting many of them[1]). Second, in this
precise case we are running a pilot project in which has been published
some academic project results as Wikivoyage itineraries (3 in EN and 3 in
ES). These are the articles we are interested in tracking now.
About the rationale, one of the bigger drivers nowadays is the well known
link between heritage, tourism and sustainability (example: the Sustainable
Development Goals), so there is a trend to better analyze this context to
study and plan. Usually touristic destinations have very well defined
countries of origin. The best you know the origin, the best you can plan.
Also there should be another positive impact in Wikimedia: new incentives
for institutions to create or translate articles to the relevant languages.
Always restrited to the heritage domain. Here in Spain tourism is one of
the main economic sectors and anything providing intelligence would help
for better planning and conservation.
Also, we have identified a new potential activity area about doing
intelligence analysis of trends in heritage (interest of the public,
changes in institutional focuses, new relevant practices, etc), not only
about the Spanish one but worldwide. This is also an scientific institution
and would find it very useful to collect the most precise traces available
(with absolute respect to the users privacy) to look for signals they could
use to refocus/prioritize their institutional goals.
So, this is it.
[1]
https://toolhub.wikimedia.org/lists/277
This is indeed a very interesting use case and a chance for this data to
be very helpful. Unfortunately to my naive eyes, this granularity of data
also carries a big privacy cost. The only way to get to it would be a
research collaboration, but there are *lots* of requests for those and not
enough researchers to help facilitate. I'm honestly not sure there's an
easy way around this... but I'll keep thinking about it and I know it'll be
useful for Hal to see this kind of request and add it to his back burner.
Thanks for detailing!