Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been
doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki
Dashboard:
The conclusion was that it would be best for the service to use the
MediaRequest API, as Dominic has also recommended in his email. Further to
this, the Foundation's Data Platform team is looking into a custom API
endpoint for media requests by category to reduce/remove the need for data
transformation and storage. As an interim solution for the GLAM Wiki
Dashboard, we advised Wikimedia Israel to migrate their project from Amazon
Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool
instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt <dominic(a)dp.la> wrote:
For my part, I'd like to point out that these
issues are recurring
problems, and also that when it comes to BaGLAMa lag, the longer it goes,
the more unrecoverable it becomes. Data errors, once introduced, are not
repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I
have shared these links numerous times over the years. So I frequently get
questions from partners who check their data and find it months out of
date. There is nothing I can tell them in these situations, except that I
have regularly seen data get that lagged, and then eventually it reaches a
point where (presumably after someone finally reached Magnus?) all the
backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such
situations where data is generated after the fact, that it is all corrupt
to some degree. My understanding of BaGLAMa is that it counts page views of
articles using images from a category. But there is no MediaWiki log of
when images were added to a page (or to a category), so if you are counting
page views that occurred three months ago based on images that are in a
page today, you might be counting crediting three past months with views
for an image that was added last week.
This issue causes massive data errors in the other direction too.
Sometimes you'll have an unexplained spike, like the several here
<https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org>
(and
by spike, I mean 700 million page views), and it's caused by the fact that
an image that was on the main page for no more than hours caused BaGLAMa to
count the entire month's page views of the main page. These errors are
unrecoverable; they stay in the data and just increase the error of the
overall total over time. There's never been a time where I could go to a
maintainer and point out this massive data error and get that rerun or
fixed. Instead, I am often in the embarrassing position of telling partners
"Here is the analytics page, but there is a big overcount on one random
month, so just always remember to mentally subtract 100 million from your
total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out
that it is an entirely flawed tool and the data is unreliable. And aside
from all of those bugs, the methodology is very flawed, since it should not
be using the Pageviews API in the first place. I consider the data
essentially fictitious anyway— we know the images we are tracking are
probably not even receiving half of the article views we are crediting to
them, but we continue to report bad data, because our projects rely
on having outcomes and reporting analytics. Glamorous and Glamorgan are
based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list
of 1000+ categories are all displayed on the landing page, many of which
are typos or non-existent categories that can never be removed or cleaned
up.
I guess my main point here is that no amount of band aids will ever
resolve some of the issues, and we need to be thinking about entirely
redoing the tool itself. Or we should have already done so as soon as the
Mediarequests API was released—which was in 2019.
Thanks!
Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo <fromeo(a)wikimedia.org> wrote:
> Dear Andrew,
>
> Thanks for escalating these specific issues to us. Giovanna and I were
> both travelling in January so we haven't been as active in Telegram.
>
> Are you aware of anyone else having issues with the GLAM Wiki Dashboard,
> or is it just The MET? I quickly sampled some of the institutions and only
> saw a "bad request" for The MET. We have been directly supporting
Wikimedia
> Israel to optimise their service, so I will raise this issue with both
> Wikimedia Israel and the Foundation team that has some familiarity with
> their service.
>
> I noted these two BaGLAMa2 issues in the Telegram chat:
>
>
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-…
>
>
>
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding…
>
> Are there other BaGLAMa2 reports we should be aware of?
>
> Metrics are definitely understood to be a priority for the Foundation
> and I heard yesterday that metrics tools rose to the top in Wikimedia
> Sweden’s survey too. There will be opportunities to discuss this further in
> the context of annual planning but I will see what can be done in the short
> term.
>
> More soon,
> Fiona
>
> On Wed, 8 Feb 2023 at 10:57, Andrew Lih <andrew.lih(a)gmail.com> wrote:
>
>> Hi WREN and GLAM folks,
>>
>> I need your insights into what could be a very problematic year for us
>> in the GLAM wiki community, as our metrics tools to measure our impact are
>> in crisis and disrepair. If you have any insights, please do share them
>> here, or in the GLAM Wiki Telegram group where this conversation started
>> happening recently.
>>
>> I sent a "HELP!" message to the Wikimedia SE content partnerships help
>> desk just the other day, included below, and hope this may be useful to
>> start a conversation. If there is enough interest, we might want to start a
>> wiki page to formally document our needs as a GLAM wiki community. Thanks.
>>
>> -Andrew
>>
>> ----
>> To: help(a)wikimedia.se
>>
>> I'd like to formally employ the Helpdesk's services in getting some
>> care and attention to BaGLAMa2. It seems to have been failing since the end
>> of last year, and even then, it was reporting extremely low figures for all
>> categories. This is one of the few tools we have in the GLAM wiki community
>> to measure impact and to make the case for sustaining our work.
>>
>>
https://glamtools.toolforge.org/baglama2/
>>
>> Without these basic metrics, 2023 could prove to be a disastrous year
>> for continuing efforts. So far, we have been unable to report good,
>> reliable numbers to folks such as the Metropolitan Museum of Art or the
>> Smithsonian Institution. Other on-demand tools such as Glamorgan usually
>> cannot handle such large category trees, and also have their own problems
>> with not being able to read the pageviews API numbers accurately, which is
>> another issue in itself.
>>
>>
https://glamtools.toolforge.org/glamorgan.html
>>
>> In short - help! How can we get this on the radar screen of people who
>> can put more care, attention, and resources into this? Thanks.
>>
>> -Andrew
>>
>> --
> *Fiona Romeo* (she/her)
> Senior Manager, Culture and Heritage
> Wikimedia Foundation <https://wikimediafoundation.org/>
>
>
> _______________________________________________
Wren mailing list --
wren(a)lists.wikimedia.org
To unsubscribe send an email to wren-leave(a)lists.wikimedia.org