Hello Fiona and everyone, Martin, one of the UK WIRs, here.
Here are more links to issues being raised on Magnus’ Bitbucket, going back several
months.
https://bitbucket.org/magnusmanske/glamtools/issues?status=new&status=o…
Magnus has done heroic volunteer work, so I’ve nothing but gratitude for him. When a tool
becomes essential to the day-to-day work of WIRs, then it needs to be properly supported
rather than reliant on a volunteer. Tools such as GLAMOrgan and BaGLAMa2 definitely are
essential to our day-to-day work, not just for reporting to institutions but for training
and outreach to persuade more cultural institutions to work with Wikimedia.
There is a fortunate precedent for what can happen. The stats server for Wikimedia
projects used to be a volunteer-run service that would occasionally stop working, didn’t
have all the functionality that we wanted, and didn’t work well with other tools. In 2014
the WMF technical team remade it from scratch: appealed for use cases, made a spec,
created an API and finally built a site to present and visualise the stats. A long time
was spent, but the outcome is great and our work now would be unthinkable without it. I
think a lot more WIRs, including some not in this discussion or on Telegram, are at the
point of begging the WMF to repeat that process for these other essential tools.
From: Mary Mark Ockerbloom <celebration.women(a)gmail.com>
Sent: 09 February 2023 15:10
To: Wikimedians in Residence Exchange Network <wren(a)lists.wikimedia.org>
Cc: Dominic Byrd-McDevitt <dominic(a)dp.la>la>; Axel Pettersson
<axel.pettersson(a)wikimedia.se>se>; Ben Vershbow <bvershbow(a)wikimedia.org>rg>;
Giovanna Fontenelle <gfontenelle(a)wikimedia.org>rg>; João Peschanski
<joalpe(a)wmnobrasil.org>rg>; Sandra Fauconnier <sandra.fauconnier(a)gmail.com>om>;
dominic(a)byrd-mcdevitt.com; scann <scannopolis(a)gmail.com>
Subject: [Wren] Re: The problems with Wikimedia metrics
Thanks for posting the fabricator ticket; I too have subscribed.
I concur with others, lack of support for reliable tools for GLAM institutions has been a
major concern for GLAMs for many years.
Mary Mark Ockerbloom
On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo
<fromeo@wikimedia.org<mailto:fromeo@wikimedia.org>> wrote:
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with
Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard:
https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as
Dominic has also recommended in his email. Further to this, the Foundation's Data
Platform team is looking into a custom API endpoint for media requests by category to
reduce/remove the need for data transformation and storage. As an interim solution for the
GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web
Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability
again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt
<dominic@dp.la<mailto:dominic@dp.la>> wrote:
For my part, I'd like to point out that these issues are recurring problems, and also
that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes.
Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these
links numerous times over the years. So I frequently get questions from partners who check
their data and find it months out of date. There is nothing I can tell them in these
situations, except that I have regularly seen data get that lagged, and then eventually it
reaches a point where (presumably after someone finally reached Magnus?) all the
backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where
data is generated after the fact, that it is all corrupt to some degree. My understanding
of BaGLAMa is that it counts page views of articles using images from a category. But
there is no MediaWiki log of when images were added to a page (or to a category), so if
you are counting page views that occurred three months ago based on images that are in a
page today, you might be counting crediting three past months with views for an image that
was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll
have an unexplained spike, like the several
here<https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&a…
(and by spike, I mean 700 million page views), and it's caused by the fact that an
image that was on the main page for no more than hours caused BaGLAMa to count the entire
month's page views of the main page. These errors are unrecoverable; they stay in the
data and just increase the error of the overall total over time. There's never been a
time where I could go to a maintainer and point out this massive data error and get that
rerun or fixed. Instead, I am often in the embarrassing position of telling partners
"Here is the analytics page, but there is a big overcount on one random month, so
just always remember to mentally subtract 100 million from your total, and treat these
numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an
entirely flawed tool and the data is unreliable. And aside from all of those bugs, the
methodology is very flawed, since it should not be using the Pageviews API in the first
place. I consider the data essentially fictitious anyway— we know the images we are
tracking are probably not even receiving half of the article views we are crediting to
them, but we continue to report bad data, because our projects rely on having outcomes and
reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+
categories are all displayed on the landing page, many of which are typos or non-existent
categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the
issues, and we need to be thinking about entirely redoing the tool itself. Or we should
have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks!
Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo
<fromeo@wikimedia.org<mailto:fromeo@wikimedia.org>> wrote:
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in
January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The
MET? I quickly sampled some of the institutions and only saw a "bad request" for
The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I
will raise this issue with both Wikimedia Israel and the Foundation team that has some
familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat:
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-…
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding…
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard
yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will
be opportunities to discuss this further in the context of annual planning but I will see
what can be done in the short term.
More soon,
Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih
<andrew.lih@gmail.com<mailto:andrew.lih@gmail.com>> wrote:
Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki
community, as our metrics tools to measure our impact are in crisis and disrepair. If you
have any insights, please do share them here, or in the GLAM Wiki Telegram group where
this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just
the other day, included below, and hope this may be useful to start a conversation. If
there is enough interest, we might want to start a wiki page to formally document our
needs as a GLAM wiki community. Thanks.
-Andrew
----
To: help@wikimedia.se<mailto:help@wikimedia.se>
I'd like to formally employ the Helpdesk's services in getting some care and
attention to BaGLAMa2. It seems to have been failing since the end of last year, and even
then, it was reporting extremely low figures for all categories. This is one of the few
tools we have in the GLAM wiki community to measure impact and to make the case for
sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing
efforts. So far, we have been unable to report good, reliable numbers to folks such as the
Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as
Glamorgan usually cannot handle such large category trees, and also have their own
problems with not being able to read the pageviews API numbers accurately, which is
another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care,
attention, and resources into this? Thanks.
-Andrew
--
[Image removed by sender.]
Fiona Romeo (she/her)
Senior Manager, Culture and Heritage
Wikimedia
Foundation<https://wikimediafoundation.org/>
_______________________________________________
Wren mailing list -- wren@lists.wikimedia.org<mailto:wren@lists.wikimedia.org>
To unsubscribe send an email to
wren-leave@lists.wikimedia.org<mailto:wren-leave@lists.wikimedia.org>