Hi all
To clarify.
Cassandra is a Wikimedia CH project and Wikimedia CH worked on it for more
than 5 years and spending a consistent budget.
The project is open and everyone can install.
Glamwikidashboard is a fork of Cassandra.
But It means that new improvements will be released ONLY in the repository
of Cassandra.
Cassandra is quite similar to a data warehouse, it means that the increase
of resources is quite huge.
So what you see of Cassandra is only the top of the iceberg. What is more
important is the architecture.
It's important to me to know that WM IL is moving to a cloud because I
anticipate them that a small server was not sufficient, but I am quite sure
that a cloud will be lesser good.
Cassandra is not only a software but it's an architecture based on virtual
servers and having SSD and RAM based repository to speed up the
performance.
In addition Wikimedia CH offers a service based solution to GLAM supporting
them in case of problems.
This is the reason why Glamwikidashboard is a part of Cassandra that has
been reviewed as user interface but it's a fork and it's not Cassandra.
It's like having the car body of Ferrari but not the engine of Ferrari and
the service of a Ferrari.
Kind regards
On Tue, 20 Dec 2022, 19:29 Dan Andreescu, <dandreescu(a)wikimedia.org> wrote:
Hi Ismael, responses inline:
On Tue, Dec 20, 2022 at 1:05 PM Ismael Olea <ismael(a)olea.org> wrote:
I'm completely new to analytics in
Wikimedia.
Welcome! :)
We are working with a heritage institution in a GLAM project and they are
interested in access statistics for the resources
they have released in
Wikimedia.
Wikimedia CH and Wikimedia Israel have worked on some dashboards showing
GLAM statistics. You may find their projects
<https://glamwikidashboard.org/> interesting. We are currently working
<https://phabricator.wikimedia.org/T325065> with Wikimedia Israel to move
their dashboard to our cloud infrastructure and eventually update our APIs
to better serve them with the data they need. Until then, it may be
interesting to see what statistics they've focused on and how they get them
from the publicly available data we already provide. You can see all this
in their source code:
https://github.com/yonathan06/cassandra-GLAM-tools
I think I got the point about how the pageviews
concept is and how to use
it but, as far as I understand, it's not possible to get details like
article pageviews, for example, per country. Is this correct?
We have an ongoing project to release per-country per-article pageview
information. It's hard for privacy reasons, and we are building a privacy
system that takes all that into account. For now, we have pageviews by
country at a high level
<https://stats.wikimedia.org/#/all-projects/reading/page-views-by-country> and
most viewed articles
<https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews#Most_viewed_articles_per_country>
by country. I'm linking to different parts of our data ecosystem so you
can get familiar with it.
If so, what should be the way to get (or process)
the information to
produce the data?
The only way is to help with the ongoing (and complex) differential
privacy work <https://phabricator.wikimedia.org/T307245>
Also, I'm reading about the resulting format[1] but I can't find the
I can see how that can be misleading. For GLAMs, usually you would want
to download media request statistics
<https://dumps.wikimedia.org/other/mediacounts/daily/>, as the
glamwikidashboard I mentioned above does. (They are currently working on
getting as much as they can from the media requests api
<https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests>
instead). If you are indeed interested in pageviews, the definition you
linked to talks about the data internally available. Can I ask you to
elaborate a bit more on why you need per-country data?
_______________________________________________
Analytics mailing list -- analytics(a)lists.wikimedia.org
To unsubscribe send an email to analytics-leave(a)lists.wikimedia.org