Hi all
To clarify.

Cassandra is a Wikimedia CH project and Wikimedia CH worked on it for more than 5 years and spending a consistent budget.

The project is open and everyone can install.

Glamwikidashboard is a fork of Cassandra. 

But It means that new improvements will be released ONLY in the repository of Cassandra.

Cassandra is quite similar to a data warehouse, it means that the increase of resources is quite huge.

So what you see of Cassandra is only the top of the iceberg. What is more important is the architecture.

It's important to me to know that WM IL is moving to a cloud because I anticipate them that a small server was not sufficient, but I am quite sure that a cloud will be lesser good.

Cassandra is not only a software but it's an architecture based on virtual servers and having SSD and RAM based repository to speed up the performance. 

In addition Wikimedia CH offers a service based solution to GLAM supporting them in case of problems.

This is the reason why Glamwikidashboard is a part of Cassandra that has been reviewed as user interface but it's a fork and it's not Cassandra. 

It's like having the car body of Ferrari but not the engine of Ferrari and the service of a Ferrari.

Kind regards

On Tue, 20 Dec 2022, 19:29 Dan Andreescu, <dandreescu@wikimedia.org> wrote:
Hi Ismael, responses inline:

On Tue, Dec 20, 2022 at 1:05 PM Ismael Olea <ismael@olea.org> wrote:
I'm completely new to analytics in Wikimedia.
 
Welcome! :) 

We are working with a heritage institution in a GLAM project and they are interested in access statistics for the resources they have released in Wikimedia.

Wikimedia CH and Wikimedia Israel have worked on some dashboards showing GLAM statistics.  You may find their projects interesting.  We are currently working with Wikimedia Israel to move their dashboard to our cloud infrastructure and eventually update our APIs to better serve them with the data they need.  Until then, it may be interesting to see what statistics they've focused on and how they get them from the publicly available data we already provide.  You can see all this in their source code: https://github.com/yonathan06/cassandra-GLAM-tools
 
I think I got the point about how the pageviews concept is and how to use it but, as far as I understand, it's not possible to get details like article pageviews, for example, per country. Is this correct?

We have an ongoing project to release per-country per-article pageview information.  It's hard for privacy reasons, and we are building a privacy system that takes all that into account.  For now, we have pageviews by country at a high level and most viewed articles by country.  I'm linking to different parts of our data ecosystem so you can get familiar with it.
 
If so, what should be the way to get (or process) the information to produce the data?

The only way is to help with the ongoing (and complex) differential privacy work

Also, I'm reading about the resulting format[1] but I can't find the related logs.

Any suggestions? Thanks.


I can see how that can be misleading.  For GLAMs, usually you would want to download media request statistics, as the glamwikidashboard I mentioned above does. (They are currently working on getting as much as they can from the media requests api instead).  If you are indeed interested in pageviews, the definition you linked to talks about the data internally available.  Can I ask you to elaborate a bit more on why you need per-country data?
_______________________________________________
Analytics mailing list -- analytics@lists.wikimedia.org
To unsubscribe send an email to analytics-leave@lists.wikimedia.org