Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
---- To: help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat: https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com wrote:
Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
To: help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
--
*Fiona Romeo* (she/her) Senior Manager, Culture and Heritage Wikimedia Foundation https://wikimediafoundation.org/
For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several here https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org wrote:
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat:
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com wrote:
Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
To: help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
--
*Fiona Romeo* (she/her) Senior Manager, Culture and Heritage Wikimedia Foundation https://wikimediafoundation.org/
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la wrote:
For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several here https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org wrote:
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat:
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com wrote:
Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
To: help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
--
*Fiona Romeo* (she/her) Senior Manager, Culture and Heritage Wikimedia Foundation https://wikimediafoundation.org/
Hi All,
I've also experienced problems reporting with GLAMorgan as described by Dominic. We also assumed, over time, that some of the data reported to institutions was not entirely correct. At the moment we had no problems with the GLAMWD
Glad to know that GLAMWD will be moved to our own servers, as Dominic said, no amount of band aids will resolve all the issues we and our partners experience. Maybe, we can have a wiki page tracking the issues and the current work being done or a tag in phabricator called "GLAM statistics"?
I just subscribed to the phabricator T321702 to follow it, thanks for sharing it Fiona! 😊
Cheers!
El mié, 8 feb 2023 a la(s) 19:40, Fiona Romeo (fromeo@wikimedia.org) escribió:
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la wrote:
For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several here https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org wrote:
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat:
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com wrote:
Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
To: help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
--
*Fiona Romeo* (she/her) Senior Manager, Culture and Heritage Wikimedia Foundation https://wikimediafoundation.org/
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Kia ora Fiona,
We've had many issues with GLAMorgan at the Auckland War Memorial Museum. We worked out a couple fixes that seemed to help:
1) The tool can't seem to handle nested categories, or occasionally subcategories at all - we'd get wildly different results even with the same parameters. We worked out that by creating a single hidden category (Category:Images from Auckland Museum https://commons.wikimedia.org/wiki/Category:Images_from_Auckland_Museum) added to every image that uses the Auckland Museum template, this helped correct whatever was going wrong. 2) Late last year the tool broke in Chrome, but apparently still worked fine in Microsoft Edge or Safari.
That doesn't address a lot of the issues above around the tool itself being inaccurate, but since using a single category we don't seem to get wild swings anymore. I'm still quite suspicious of the data - there have been a few times when we were expecting different numbers (e.g. when a photo was featured on the main page, but no massive spike was recorded in GLAMorgan).
Ngaa mihi nui, Marty
On Thu, 9 Feb 2023 at 12:13, Mauricio V. Genta wir@wikimedia.org.ar wrote:
Hi All,
I've also experienced problems reporting with GLAMorgan as described by Dominic. We also assumed, over time, that some of the data reported to institutions was not entirely correct. At the moment we had no problems with the GLAMWD
Glad to know that GLAMWD will be moved to our own servers, as Dominic said, no amount of band aids will resolve all the issues we and our partners experience. Maybe, we can have a wiki page tracking the issues and the current work being done or a tag in phabricator called "GLAM statistics"?
I just subscribed to the phabricator T321702 to follow it, thanks for sharing it Fiona! 😊
Cheers!
El mié, 8 feb 2023 a la(s) 19:40, Fiona Romeo (fromeo@wikimedia.org) escribió:
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la wrote:
For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several here https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org wrote:
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat:
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com wrote:
Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
To: help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
--
*Fiona Romeo* (she/her) Senior Manager, Culture and Heritage Wikimedia Foundation https://wikimediafoundation.org/
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
-- *Mauricio V. Genta* *Coordinador del Proyecto de Digitalización | https://w.wiki/SP https://w.wiki/SP*
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
On Thu, 9 Feb 2023 at 01:52, Marty Blayney martyblayney.machi@gmail.com wrote:
We've had many issues with GLAMorgan
- Late last year the tool broke in Chrome, but apparently still worked fine in Microsoft Edge or Safari.
It also ceased working in Firefox.
On Wed, Feb 8, 2023 at 8:52 PM Marty Blayney martyblayney.machi@gmail.com wrote:
Kia ora Fiona,
[...]
That doesn't address a lot of the issues above around the tool itself being inaccurate, but since using a single category we don't seem to get wild swings anymore. I'm still quite suspicious of the data - there have been a few times when we were expecting different numbers (e.g. when a photo was featured on the main page, but no massive spike was recorded in GLAMorgan).
This type of historical data isn't really possible on GLAMorgan, and the tool really leads people astray by making it sound like the data is more reliable than it is. By my understanding, the tool is doing two things, all live in your browser on the client side. First, a query to the PetScan API finds all the pages across Wikimedia sites using files from the given category. Then, a series of queries to the Wikimedia pageview API for each page for the time range given. What this means is you are actually just calculating the historical page view data for the pages *currently* using images from the category, not the actual page views that the images saw during the year and month you are querying. If the photo was featured on the main page, but is not currently, it will not know that, or show any of those page views. Conversely, if you are checking a category with an image currently on the main page, even if only for a few hours, it will credit that image with all the main page's views for every month in history.
To check this, try inputting the category that an image on the main page is in right now, but check its page views back in 2016. All of this basic faulty logic would be solved by checking the actual media requests, rather than the pageview API. It would be even better if there was a way to query a category for historical view on its members at a given timestamp in history, but that doesn't seem possible.
Thanks for posting the fabricator ticket; I too have subscribed. I concur with others, lack of support for reliable tools for GLAM institutions has been a major concern for GLAMs for many years. Mary Mark Ockerbloom
On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org wrote:
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la wrote:
For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several here https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org wrote:
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat:
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com wrote:
Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
To: help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
--
*Fiona Romeo* (she/her) Senior Manager, Culture and Heritage Wikimedia Foundation https://wikimediafoundation.org/
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Hello everyone. Sorry I was slow to reply. Andrew, thank you for bringing up this fundamental discussion, and thank you to everyone who has shared on the thread.
Wiki Movimento Brasil is very concerned in general about the lack of support for and maintenance of basic tools for GLAM-Wiki partnerships. We think this is an urgent matter, and we have been in touch with several affiliates to find a solution out of this situation.
We were one of the first global adopters of the GLAM Wiki Dashboard. It has a lot of potential and has served us a lot. We were even involved in a small part of the development, working on adjusting the export modules, and have regularly provided feedback about the tool. It has served a major purpose in our work, as we relied on it for communication with our GLAM partners.
Unfortunately, it has not been updated as quickly as we would need, and some urgent features have not been developed yet, which has led to the publication of inaccurate information. Given this context, we have switched to rely on data from GLAMorgan --a computational costly tool, also not so reliable-- to provide the periodic reports we send out to our GLAM partners. We have not used BaGLAMa2. I have attached an example of our periodic report.
Our affiliate has worked on a feedback report to be sent to WMIL, and I am sharing it with you, so you have a sense of some of the bugs and problems we have listed. I have not sent the report to them, as I understand they have recently gone through a leadership change and I am waiting a couple of weeks to send it so as not to overwhelm the new ED.
We are happy this discussion on the technical infrastructure for GLAM partnerships is happening and are committed to take a role in improving our products and technology for GLAMs.
Best,
João
Em qui., 9 de fev. de 2023 às 12:10, Mary Mark Ockerbloom < celebration.women@gmail.com> escreveu:
Thanks for posting the fabricator ticket; I too have subscribed. I concur with others, lack of support for reliable tools for GLAM institutions has been a major concern for GLAMs for many years. Mary Mark Ockerbloom
On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org wrote:
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la wrote:
For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several here https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org wrote:
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat:
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com wrote:
Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
To: help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
--
*Fiona Romeo* (she/her) Senior Manager, Culture and Heritage Wikimedia Foundation https://wikimediafoundation.org/
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Hello Fiona and everyone, Martin, one of the UK WIRs, here. Here are more links to issues being raised on Magnus’ Bitbucket, going back several months. https://bitbucket.org/magnusmanske/glamtools/issues?status=new&status=op...
Magnus has done heroic volunteer work, so I’ve nothing but gratitude for him. When a tool becomes essential to the day-to-day work of WIRs, then it needs to be properly supported rather than reliant on a volunteer. Tools such as GLAMOrgan and BaGLAMa2 definitely are essential to our day-to-day work, not just for reporting to institutions but for training and outreach to persuade more cultural institutions to work with Wikimedia.
There is a fortunate precedent for what can happen. The stats server for Wikimedia projects used to be a volunteer-run service that would occasionally stop working, didn’t have all the functionality that we wanted, and didn’t work well with other tools. In 2014 the WMF technical team remade it from scratch: appealed for use cases, made a spec, created an API and finally built a site to present and visualise the stats. A long time was spent, but the outcome is great and our work now would be unthinkable without it. I think a lot more WIRs, including some not in this discussion or on Telegram, are at the point of begging the WMF to repeat that process for these other essential tools.
From: Mary Mark Ockerbloom celebration.women@gmail.com Sent: 09 February 2023 15:10 To: Wikimedians in Residence Exchange Network wren@lists.wikimedia.org Cc: Dominic Byrd-McDevitt dominic@dp.la; Axel Pettersson axel.pettersson@wikimedia.se; Ben Vershbow bvershbow@wikimedia.org; Giovanna Fontenelle gfontenelle@wikimedia.org; João Peschanski joalpe@wmnobrasil.org; Sandra Fauconnier sandra.fauconnier@gmail.com; dominic@byrd-mcdevitt.com; scann scannopolis@gmail.com Subject: [Wren] Re: The problems with Wikimedia metrics
Thanks for posting the fabricator ticket; I too have subscribed. I concur with others, lack of support for reliable tools for GLAM institutions has been a major concern for GLAMs for many years. Mary Mark Ockerbloom
On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo <fromeo@wikimedia.orgmailto:fromeo@wikimedia.org> wrote: Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt <dominic@dp.lamailto:dominic@dp.la> wrote: For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several herehttps://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo <fromeo@wikimedia.orgmailto:fromeo@wikimedia.org> wrote: Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat: https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih <andrew.lih@gmail.commailto:andrew.lih@gmail.com> wrote: Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
---- To: help@wikimedia.semailto:help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
-- [Image removed by sender.] Fiona Romeo (she/her) Senior Manager, Culture and Heritage Wikimedia Foundationhttps://wikimediafoundation.org/
_______________________________________________ Wren mailing list -- wren@lists.wikimedia.orgmailto:wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.orgmailto:wren-leave@lists.wikimedia.org
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404 error)."
https://glamtools.toolforge.org/glamorgan.html
-Andrew
On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < celebration.women@gmail.com> wrote:
Thanks for posting the fabricator ticket; I too have subscribed. I concur with others, lack of support for reliable tools for GLAM institutions has been a major concern for GLAMs for many years. Mary Mark Ockerbloom
On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org wrote:
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la wrote:
For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several here https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org wrote:
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat:
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com wrote:
Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
To: help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
--
*Fiona Romeo* (she/her) Senior Manager, Culture and Heritage Wikimedia Foundation https://wikimediafoundation.org/
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
On Mon, 13 Feb 2023 at 15:03, Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404 error)."
What browser are you using? As I noted earlier, GLAMORGAN works (for me) in Edge, but not Firefox. It also reportedly works in Safari, but not Chrome.
Oh, thanks for that reminder - yes, it seems to work in Safari, but I'll test it in other browsers I can access. I've updated the meta-wiki page with a warning.
After some sleuthing, it may be because of ad blockers, as the Javascript console on Chrome reads "net::ERR_BLOCKED_BY_CLIENT"
Since each of the HTTP API requests are generated from within the user's browser and reads like this: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/incubator.wi.... ..
I'm wondering if this is a case of ad block false positives because the URL has "metrics" in the name, which is a key thing these ad blockers aggressively filter. However, right now toolforge is having a bad hair day, so I cannot test this.
-Andrew
On Mon, Feb 13, 2023 at 10:54 AM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On Mon, 13 Feb 2023 at 15:03, Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful
pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404
error)."
What browser are you using? As I noted earlier, GLAMORGAN works (for me) in Edge, but not Firefox. It also reportedly works in Safari, but not Chrome.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk _______________________________________________ Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
EUREKA. This was it. PROBLEM SOLVED.
GLAMorgan is majorly affected by ad blockers. Specifically: most ad blockers will try to aggressively block URLs that contain "metrics" in the URL name. GLAMorgan is executed in Javascript from the user's browser, so it's subject to ad blocking because the Pageviews API call to Mediawiki is to "https://wikimedia.org/api/rest_v1/metrics/pageviews https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/incubator.wikipedia/all-access/user/ ..."
Turn off ad blocking, or use a browser profile that doesn't use it, and you're good.
-Andrew
On Mon, Feb 13, 2023 at 12:55 PM Andrew Lih andrew.lih@gmail.com wrote:
Oh, thanks for that reminder - yes, it seems to work in Safari, but I'll test it in other browsers I can access. I've updated the meta-wiki page with a warning.
After some sleuthing, it may be because of ad blockers, as the Javascript console on Chrome reads "net::ERR_BLOCKED_BY_CLIENT"
Since each of the HTTP API requests are generated from within the user's browser and reads like this:
https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/incubator.wi.... ..
I'm wondering if this is a case of ad block false positives because the URL has "metrics" in the name, which is a key thing these ad blockers aggressively filter. However, right now toolforge is having a bad hair day, so I cannot test this.
-Andrew
On Mon, Feb 13, 2023 at 10:54 AM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On Mon, 13 Feb 2023 at 15:03, Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful
pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404
error)."
What browser are you using? As I noted earlier, GLAMORGAN works (for me) in Edge, but not Firefox. It also reportedly works in Safari, but not Chrome.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk _______________________________________________ Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Thanks for sharing your discovery, Andrew. I'm glad that GLAMorgan is usable for those who rely on it.
But I heard the data reliability concerns shared in this thread and will continue exploring ways that we might improve metrics.
Fiona
On Mon, 13 Feb 2023 at 21:47, Andrew Lih andrew.lih@gmail.com wrote:
EUREKA. This was it. PROBLEM SOLVED.
GLAMorgan is majorly affected by ad blockers. Specifically: most ad blockers will try to aggressively block URLs that contain "metrics" in the URL name. GLAMorgan is executed in Javascript from the user's browser, so it's subject to ad blocking because the Pageviews API call to Mediawiki is to "https://wikimedia.org/api/rest_v1/metrics/pageviews https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/incubator.wikipedia/all-access/user/ ..."
Turn off ad blocking, or use a browser profile that doesn't use it, and you're good.
-Andrew
On Mon, Feb 13, 2023 at 12:55 PM Andrew Lih andrew.lih@gmail.com wrote:
Oh, thanks for that reminder - yes, it seems to work in Safari, but I'll test it in other browsers I can access. I've updated the meta-wiki page with a warning.
After some sleuthing, it may be because of ad blockers, as the Javascript console on Chrome reads "net::ERR_BLOCKED_BY_CLIENT"
Since each of the HTTP API requests are generated from within the user's browser and reads like this:
https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/incubator.wi.... ..
I'm wondering if this is a case of ad block false positives because the URL has "metrics" in the name, which is a key thing these ad blockers aggressively filter. However, right now toolforge is having a bad hair day, so I cannot test this.
-Andrew
On Mon, Feb 13, 2023 at 10:54 AM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On Mon, 13 Feb 2023 at 15:03, Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful
pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404
error)."
What browser are you using? As I noted earlier, GLAMORGAN works (for me) in Edge, but not Firefox. It also reportedly works in Safari, but not Chrome.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk _______________________________________________ Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Today, Wikimedia Cloud had an outage that highlights the fragile nature of our GLAM wiki ecosystem:
– All tools on wmcloud.org and toolforge.org were knocked out and unavailable for 4 hours. https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... – Petscan needed an extra hour before it came back, because it is not setup to run automatically, and needs a manual restart by logging in as Magnus and running a script by hand. This is a problematic situation for service deployment. – Many tools rely on Petscan, such as GLAMorgan for expanding category trees and generating Mediawiki page titles, so this outage affected many more tools – BaGLAMa2 seems to have not come back successfully, as all the categories that should be tracked are missing. Likely, the data is all there somewhere, but it currently needs some loving care to be restored. Unclear if this is being worked on. – PAWS, the visual Python environment on wmcloud that is a workhorse for bot work and scripts, is still down and needs some loving care to revive. https://phabricator.wikimedia.org/T329581
In short – we're trying to be scrappy and resourceful, but we're hurting.
-Andrew
On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404 error)."
https://glamtools.toolforge.org/glamorgan.html
-Andrew
On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < celebration.women@gmail.com> wrote:
Thanks for posting the fabricator ticket; I too have subscribed. I concur with others, lack of support for reliable tools for GLAM institutions has been a major concern for GLAMs for many years. Mary Mark Ockerbloom
On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org wrote:
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la wrote:
For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several here https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org wrote:
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat:
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com wrote:
Hi WREN and GLAM folks,
I need your insights into what could be a very problematic year for us in the GLAM wiki community, as our metrics tools to measure our impact are in crisis and disrepair. If you have any insights, please do share them here, or in the GLAM Wiki Telegram group where this conversation started happening recently.
I sent a "HELP!" message to the Wikimedia SE content partnerships help desk just the other day, included below, and hope this may be useful to start a conversation. If there is enough interest, we might want to start a wiki page to formally document our needs as a GLAM wiki community. Thanks.
-Andrew
To: help@wikimedia.se
I'd like to formally employ the Helpdesk's services in getting some care and attention to BaGLAMa2. It seems to have been failing since the end of last year, and even then, it was reporting extremely low figures for all categories. This is one of the few tools we have in the GLAM wiki community to measure impact and to make the case for sustaining our work.
https://glamtools.toolforge.org/baglama2/
Without these basic metrics, 2023 could prove to be a disastrous year for continuing efforts. So far, we have been unable to report good, reliable numbers to folks such as the Metropolitan Museum of Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan usually cannot handle such large category trees, and also have their own problems with not being able to read the pageviews API numbers accurately, which is another issue in itself.
https://glamtools.toolforge.org/glamorgan.html
In short - help! How can we get this on the radar screen of people who can put more care, attention, and resources into this? Thanks.
-Andrew
--
*Fiona Romeo* (she/her) Senior Manager, Culture and Heritage Wikimedia Foundation https://wikimediafoundation.org/
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Hi all, (Sent on behalf of the helpdesk.)
Andrew also sent a request to the Content Partnerships Hub helpdesk about this issue. We very much hear everyone’s concerns. Though the Helpdesk typically deals with content uploads, we do have another part of the hub initiative that is preparing for better (strategic) tools support in the upcoming year(s).[1]
Our current capacity is however very limited, and we are still not sure what funding we will receive for our future work. Also, we currently lack manpower and skills for this type of immediate fire-fighting, so if we were to work on this, it would be at the expense of other prioritized software development.
As a response to the Helpdesk request, we would therefore suggest setting up a meeting with all interested people on this thread, with the goal to share perspectives and to brainstorm an approach and capture your thoughts on priorities.
Please provide your availability in this Doodle: https://doodle.com/meeting/participate/id/dLZwmRWa (With excuses for being Europe/America friendly over other time zones.)
Please note that Sandra Fauconnier (who works as Product Strategist) will be absent from February 15 for at least a month (due to surgery + recovery period). During her absence, André Costa (andre.costa@wikimedia.se) from WMSE will represent the Content Partnerships Hub on this topic.
[1] https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tis 14 feb. 2023 kl 06:38 skrev Andrew Lih andrew.lih@gmail.com:
Today, Wikimedia Cloud had an outage that highlights the fragile nature of our GLAM wiki ecosystem:
– All tools on wmcloud.org and toolforge.org were knocked out and unavailable for 4 hours.
https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... – Petscan needed an extra hour before it came back, because it is not setup to run automatically, and needs a manual restart by logging in as Magnus and running a script by hand. This is a problematic situation for service deployment. – Many tools rely on Petscan, such as GLAMorgan for expanding category trees and generating Mediawiki page titles, so this outage affected many more tools – BaGLAMa2 seems to have not come back successfully, as all the categories that should be tracked are missing. Likely, the data is all there somewhere, but it currently needs some loving care to be restored. Unclear if this is being worked on. – PAWS, the visual Python environment on wmcloud that is a workhorse for bot work and scripts, is still down and needs some loving care to revive. https://phabricator.wikimedia.org/T329581
In short – we're trying to be scrappy and resourceful, but we're hurting.
-Andrew
On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404 error)."
https://glamtools.toolforge.org/glamorgan.html
-Andrew
On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < celebration.women@gmail.com> wrote:
Thanks for posting the fabricator ticket; I too have subscribed. I concur with others, lack of support for reliable tools for GLAM institutions has been a major concern for GLAMs for many years. Mary Mark Ockerbloom
On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org wrote:
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la wrote:
For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several here https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org wrote:
Dear Andrew,
Thanks for escalating these specific issues to us. Giovanna and I were both travelling in January so we haven't been as active in Telegram.
Are you aware of anyone else having issues with the GLAM Wiki Dashboard, or is it just The MET? I quickly sampled some of the institutions and only saw a "bad request" for The MET. We have been directly supporting Wikimedia Israel to optimise their service, so I will raise this issue with both Wikimedia Israel and the Foundation team that has some familiarity with their service.
I noted these two BaGLAMa2 issues in the Telegram chat:
https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d...
https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-...
Are there other BaGLAMa2 reports we should be aware of?
Metrics are definitely understood to be a priority for the Foundation and I heard yesterday that metrics tools rose to the top in Wikimedia Sweden’s survey too. There will be opportunities to discuss this further in the context of annual planning but I will see what can be done in the short term.
More soon, Fiona
On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com wrote:
> Hi WREN and GLAM folks, > > I need your insights into what could be a very problematic year for > us in the GLAM wiki community, as our metrics tools to measure our impact > are in crisis and disrepair. If you have any insights, please do share them > here, or in the GLAM Wiki Telegram group where this conversation started > happening recently. > > I sent a "HELP!" message to the Wikimedia SE content partnerships > help desk just the other day, included below, and hope this may be useful > to start a conversation. If there is enough interest, we might want to > start a wiki page to formally document our needs as a GLAM wiki community. > Thanks. > > -Andrew > > ---- > To: help@wikimedia.se > > I'd like to formally employ the Helpdesk's services in getting some > care and attention to BaGLAMa2. It seems to have been failing since the end > of last year, and even then, it was reporting extremely low figures for all > categories. This is one of the few tools we have in the GLAM wiki community > to measure impact and to make the case for sustaining our work. > > https://glamtools.toolforge.org/baglama2/ > > Without these basic metrics, 2023 could prove to be a disastrous > year for continuing efforts. So far, we have been unable to report good, > reliable numbers to folks such as the Metropolitan Museum of Art or the > Smithsonian Institution. Other on-demand tools such as Glamorgan usually > cannot handle such large category trees, and also have their own problems > with not being able to read the pageviews API numbers accurately, which is > another issue in itself. > > https://glamtools.toolforge.org/glamorgan.html > > In short - help! How can we get this on the radar screen of people > who can put more care, attention, and resources into this? Thanks. > > -Andrew > > -- *Fiona Romeo* (she/her) Senior Manager, Culture and Heritage Wikimedia Foundation https://wikimediafoundation.org/
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Hi again, Thanks for all the replies to the Doodle, the winner is Wednesday February 22, 17.00-18.00 (GMT+1).
@Eric Luth eric.luth@wikimedia.se will send out a calendar invite with a meeting link and agenda tomorrow.
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tors 16 feb. 2023 kl 13:48 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi all, (Sent on behalf of the helpdesk.)
Andrew also sent a request to the Content Partnerships Hub helpdesk about this issue. We very much hear everyone’s concerns. Though the Helpdesk typically deals with content uploads, we do have another part of the hub initiative that is preparing for better (strategic) tools support in the upcoming year(s).[1]
Our current capacity is however very limited, and we are still not sure what funding we will receive for our future work. Also, we currently lack manpower and skills for this type of immediate fire-fighting, so if we were to work on this, it would be at the expense of other prioritized software development.
As a response to the Helpdesk request, we would therefore suggest setting up a meeting with all interested people on this thread, with the goal to share perspectives and to brainstorm an approach and capture your thoughts on priorities.
Please provide your availability in this Doodle: https://doodle.com/meeting/participate/id/dLZwmRWa (With excuses for being Europe/America friendly over other time zones.)
Please note that Sandra Fauconnier (who works as Product Strategist) will be absent from February 15 for at least a month (due to surgery + recovery period). During her absence, André Costa (andre.costa@wikimedia.se) from WMSE will represent the Content Partnerships Hub on this topic.
[1] https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tis 14 feb. 2023 kl 06:38 skrev Andrew Lih andrew.lih@gmail.com:
Today, Wikimedia Cloud had an outage that highlights the fragile nature of our GLAM wiki ecosystem:
– All tools on wmcloud.org and toolforge.org were knocked out and unavailable for 4 hours.
https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... – Petscan needed an extra hour before it came back, because it is not setup to run automatically, and needs a manual restart by logging in as Magnus and running a script by hand. This is a problematic situation for service deployment. – Many tools rely on Petscan, such as GLAMorgan for expanding category trees and generating Mediawiki page titles, so this outage affected many more tools – BaGLAMa2 seems to have not come back successfully, as all the categories that should be tracked are missing. Likely, the data is all there somewhere, but it currently needs some loving care to be restored. Unclear if this is being worked on. – PAWS, the visual Python environment on wmcloud that is a workhorse for bot work and scripts, is still down and needs some loving care to revive. https://phabricator.wikimedia.org/T329581
In short – we're trying to be scrappy and resourceful, but we're hurting.
-Andrew
On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404 error)."
https://glamtools.toolforge.org/glamorgan.html
-Andrew
On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < celebration.women@gmail.com> wrote:
Thanks for posting the fabricator ticket; I too have subscribed. I concur with others, lack of support for reliable tools for GLAM institutions has been a major concern for GLAMs for many years. Mary Mark Ockerbloom
On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org wrote:
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la wrote:
For my part, I'd like to point out that these issues are recurring problems, and also that when it comes to BaGLAMa lag, the longer it goes, the more unrecoverable it becomes. Data errors, once introduced, are not repairable.
Dozens of the tracked categories in BaGLAMa are DPLA institutions, and I have shared these links numerous times over the years. So I frequently get questions from partners who check their data and find it months out of date. There is nothing I can tell them in these situations, except that I have regularly seen data get that lagged, and then eventually it reaches a point where (presumably after someone finally reached Magnus?) all the backlogged months come in at once.
This causes its own problems, I believe, because I have to assume in such situations where data is generated after the fact, that it is all corrupt to some degree. My understanding of BaGLAMa is that it counts page views of articles using images from a category. But there is no MediaWiki log of when images were added to a page (or to a category), so if you are counting page views that occurred three months ago based on images that are in a page today, you might be counting crediting three past months with views for an image that was added last week.
This issue causes massive data errors in the other direction too. Sometimes you'll have an unexplained spike, like the several here https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and by spike, I mean 700 million page views), and it's caused by the fact that an image that was on the main page for no more than hours caused BaGLAMa to count the entire month's page views of the main page. These errors are unrecoverable; they stay in the data and just increase the error of the overall total over time. There's never been a time where I could go to a maintainer and point out this massive data error and get that rerun or fixed. Instead, I am often in the embarrassing position of telling partners "Here is the analytics page, but there is a big overcount on one random month, so just always remember to mentally subtract 100 million from your total, and treat these numbers as very inexact."
So as long as we are talking about BaGLAMa at all, I do have to point out that it is an entirely flawed tool and the data is unreliable. And aside from all of those bugs, the methodology is very flawed, since it should not be using the Pageviews API in the first place. I consider the data essentially fictitious anyway— we know the images we are tracking are probably not even receiving half of the article views we are crediting to them, but we continue to report bad data, because our projects rely on having outcomes and reporting analytics. Glamorous and Glamorgan are based on the same flawed methodology.
And I haven't even started on the clunky UI, where an ever-growing list of 1000+ categories are all displayed on the landing page, many of which are typos or non-existent categories that can never be removed or cleaned up.
I guess my main point here is that no amount of band aids will ever resolve some of the issues, and we need to be thinking about entirely redoing the tool itself. Or we should have already done so as soon as the Mediarequests API was released—which was in 2019.
Thanks! Dominic
On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org wrote:
> Dear Andrew, > > Thanks for escalating these specific issues to us. Giovanna and I > were both travelling in January so we haven't been as active in Telegram. > > Are you aware of anyone else having issues with the GLAM Wiki > Dashboard, or is it just The MET? I quickly sampled some of the > institutions and only saw a "bad request" for The MET. We have been > directly supporting Wikimedia Israel to optimise their service, so I will > raise this issue with both Wikimedia Israel and the Foundation team that > has some familiarity with their service. > > I noted these two BaGLAMa2 issues in the Telegram chat: > > https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d... > > > https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-... > > Are there other BaGLAMa2 reports we should be aware of? > > Metrics are definitely understood to be a priority for the > Foundation and I heard yesterday that metrics tools rose to the top in > Wikimedia Sweden’s survey too. There will be opportunities to discuss this > further in the context of annual planning but I will see what can be done > in the short term. > > More soon, > Fiona > > On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com > wrote: > >> Hi WREN and GLAM folks, >> >> I need your insights into what could be a very problematic year for >> us in the GLAM wiki community, as our metrics tools to measure our impact >> are in crisis and disrepair. If you have any insights, please do share them >> here, or in the GLAM Wiki Telegram group where this conversation started >> happening recently. >> >> I sent a "HELP!" message to the Wikimedia SE content partnerships >> help desk just the other day, included below, and hope this may be useful >> to start a conversation. If there is enough interest, we might want to >> start a wiki page to formally document our needs as a GLAM wiki community. >> Thanks. >> >> -Andrew >> >> ---- >> To: help@wikimedia.se >> >> I'd like to formally employ the Helpdesk's services in getting some >> care and attention to BaGLAMa2. It seems to have been failing since the end >> of last year, and even then, it was reporting extremely low figures for all >> categories. This is one of the few tools we have in the GLAM wiki community >> to measure impact and to make the case for sustaining our work. >> >> https://glamtools.toolforge.org/baglama2/ >> >> Without these basic metrics, 2023 could prove to be a disastrous >> year for continuing efforts. So far, we have been unable to report good, >> reliable numbers to folks such as the Metropolitan Museum of Art or the >> Smithsonian Institution. Other on-demand tools such as Glamorgan usually >> cannot handle such large category trees, and also have their own problems >> with not being able to read the pageviews API numbers accurately, which is >> another issue in itself. >> >> https://glamtools.toolforge.org/glamorgan.html >> >> In short - help! How can we get this on the radar screen of people >> who can put more care, attention, and resources into this? Thanks. >> >> -Andrew >> >> -- > *Fiona Romeo* (she/her) > Senior Manager, Culture and Heritage > Wikimedia Foundation https://wikimediafoundation.org/ > > > _______________________________________________
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Hi all,
The meeting will take place at 16:00 UTC tomorrow on this link: https://us02web.zoom.us/j/81455808411 https://www.google.com/url?q=https://us02web.zoom.us/j/81455808411&sa=D&source=calendar&ust=1677422345273437&usg=AOvVaw2PGvR3NmIQDqmPoiv_h3mv André and I will join on behalf of Wikimedia Sverige and the Content Partnerships Hub initiative. Feel free to share the link with anyone you think might be good to bring to the call.
André and I will get back to you with an agenda as soon as possible, but the main part is of course to listen in and understand the situation, and discuss the priorities, as per Axel's previous email.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den mån 20 feb. 2023 kl 17:23 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi again, Thanks for all the replies to the Doodle, the winner is Wednesday February 22, 17.00-18.00 (GMT+1).
@Eric Luth eric.luth@wikimedia.se will send out a calendar invite with a meeting link and agenda tomorrow.
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tors 16 feb. 2023 kl 13:48 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi all, (Sent on behalf of the helpdesk.)
Andrew also sent a request to the Content Partnerships Hub helpdesk about this issue. We very much hear everyone’s concerns. Though the Helpdesk typically deals with content uploads, we do have another part of the hub initiative that is preparing for better (strategic) tools support in the upcoming year(s).[1]
Our current capacity is however very limited, and we are still not sure what funding we will receive for our future work. Also, we currently lack manpower and skills for this type of immediate fire-fighting, so if we were to work on this, it would be at the expense of other prioritized software development.
As a response to the Helpdesk request, we would therefore suggest setting up a meeting with all interested people on this thread, with the goal to share perspectives and to brainstorm an approach and capture your thoughts on priorities.
Please provide your availability in this Doodle: https://doodle.com/meeting/participate/id/dLZwmRWa (With excuses for being Europe/America friendly over other time zones.)
Please note that Sandra Fauconnier (who works as Product Strategist) will be absent from February 15 for at least a month (due to surgery + recovery period). During her absence, André Costa (andre.costa@wikimedia.se) from WMSE will represent the Content Partnerships Hub on this topic.
[1] https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tis 14 feb. 2023 kl 06:38 skrev Andrew Lih andrew.lih@gmail.com:
Today, Wikimedia Cloud had an outage that highlights the fragile nature of our GLAM wiki ecosystem:
– All tools on wmcloud.org and toolforge.org were knocked out and unavailable for 4 hours.
https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... – Petscan needed an extra hour before it came back, because it is not setup to run automatically, and needs a manual restart by logging in as Magnus and running a script by hand. This is a problematic situation for service deployment. – Many tools rely on Petscan, such as GLAMorgan for expanding category trees and generating Mediawiki page titles, so this outage affected many more tools – BaGLAMa2 seems to have not come back successfully, as all the categories that should be tracked are missing. Likely, the data is all there somewhere, but it currently needs some loving care to be restored. Unclear if this is being worked on. – PAWS, the visual Python environment on wmcloud that is a workhorse for bot work and scripts, is still down and needs some loving care to revive. https://phabricator.wikimedia.org/T329581
In short – we're trying to be scrappy and resourceful, but we're hurting.
-Andrew
On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404 error)."
https://glamtools.toolforge.org/glamorgan.html
-Andrew
On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < celebration.women@gmail.com> wrote:
Thanks for posting the fabricator ticket; I too have subscribed. I concur with others, lack of support for reliable tools for GLAM institutions has been a major concern for GLAMs for many years. Mary Mark Ockerbloom
On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org wrote:
Thanks for adding your perspective, Dominic.
Here is the Phabricator ticket that tracks work the Foundation has been doing with Wikimedia Israel to resolve storage issues for the GLAM Wiki Dashboard: https://phabricator.wikimedia.org/T321702
The conclusion was that it would be best for the service to use the MediaRequest API, as Dominic has also recommended in his email. Further to this, the Foundation's Data Platform team is looking into a custom API endpoint for media requests by category to reduce/remove the need for data transformation and storage. As an interim solution for the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their project from Amazon Web Services to our own servers and made capacity available for that.
We don't know as much about the BaGLAMa2 issues at the moment.
I'm very sorry to see our GLAM wiki community struggling with tool instability again.
Fiona
On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la wrote:
> For my part, I'd like to point out that these issues are recurring > problems, and also that when it comes to BaGLAMa lag, the longer it goes, > the more unrecoverable it becomes. Data errors, once introduced, are not > repairable. > > Dozens of the tracked categories in BaGLAMa are DPLA institutions, > and I have shared these links numerous times over the years. So I > frequently get questions from partners who check their data and find it > months out of date. There is nothing I can tell them in these situations, > except that I have regularly seen data get that lagged, and then eventually > it reaches a point where (presumably after someone finally reached Magnus?) > all the backlogged months come in at once. > > This causes its own problems, I believe, because I have to assume in > such situations where data is generated after the fact, that it is all > corrupt to some degree. My understanding of BaGLAMa is that it counts page > views of articles using images from a category. But there is no MediaWiki > log of when images were added to a page (or to a category), so if you are > counting page views that occurred three months ago based on images that are > in a page today, you might be counting crediting three past months with > views for an image that was added last week. > > This issue causes massive data errors in the other direction too. > Sometimes you'll have an unexplained spike, like the several here > https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and > by spike, I mean 700 million page views), and it's caused by the fact that > an image that was on the main page for no more than hours caused BaGLAMa to > count the entire month's page views of the main page. These errors are > unrecoverable; they stay in the data and just increase the error of the > overall total over time. There's never been a time where I could go to a > maintainer and point out this massive data error and get that rerun or > fixed. Instead, I am often in the embarrassing position of telling partners > "Here is the analytics page, but there is a big overcount on one random > month, so just always remember to mentally subtract 100 million from your > total, and treat these numbers as very inexact." > > So as long as we are talking about BaGLAMa at all, I do have to > point out that it is an entirely flawed tool and the data is unreliable. > And aside from all of those bugs, the methodology is very flawed, since it > should not be using the Pageviews API in the first place. I consider the > data essentially fictitious anyway— we know the images we are tracking are > probably not even receiving half of the article views we are crediting to > them, but we continue to report bad data, because our projects rely > on having outcomes and reporting analytics. Glamorous and Glamorgan are > based on the same flawed methodology. > > And I haven't even started on the clunky UI, where an ever-growing > list of 1000+ categories are all displayed on the landing page, many of > which are typos or non-existent categories that can never be removed or > cleaned up. > > I guess my main point here is that no amount of band aids will ever > resolve some of the issues, and we need to be thinking about entirely > redoing the tool itself. Or we should have already done so as soon as the > Mediarequests API was released—which was in 2019. > > Thanks! > Dominic > > > On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org > wrote: > >> Dear Andrew, >> >> Thanks for escalating these specific issues to us. Giovanna and I >> were both travelling in January so we haven't been as active in Telegram. >> >> Are you aware of anyone else having issues with the GLAM Wiki >> Dashboard, or is it just The MET? I quickly sampled some of the >> institutions and only saw a "bad request" for The MET. We have been >> directly supporting Wikimedia Israel to optimise their service, so I will >> raise this issue with both Wikimedia Israel and the Foundation team that >> has some familiarity with their service. >> >> I noted these two BaGLAMa2 issues in the Telegram chat: >> >> https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d... >> >> >> https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-... >> >> Are there other BaGLAMa2 reports we should be aware of? >> >> Metrics are definitely understood to be a priority for the >> Foundation and I heard yesterday that metrics tools rose to the top in >> Wikimedia Sweden’s survey too. There will be opportunities to discuss this >> further in the context of annual planning but I will see what can be done >> in the short term. >> >> More soon, >> Fiona >> >> On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com >> wrote: >> >>> Hi WREN and GLAM folks, >>> >>> I need your insights into what could be a very problematic year >>> for us in the GLAM wiki community, as our metrics tools to measure our >>> impact are in crisis and disrepair. If you have any insights, please do >>> share them here, or in the GLAM Wiki Telegram group where this conversation >>> started happening recently. >>> >>> I sent a "HELP!" message to the Wikimedia SE content partnerships >>> help desk just the other day, included below, and hope this may be useful >>> to start a conversation. If there is enough interest, we might want to >>> start a wiki page to formally document our needs as a GLAM wiki community. >>> Thanks. >>> >>> -Andrew >>> >>> ---- >>> To: help@wikimedia.se >>> >>> I'd like to formally employ the Helpdesk's services in getting >>> some care and attention to BaGLAMa2. It seems to have been failing since >>> the end of last year, and even then, it was reporting extremely low figures >>> for all categories. This is one of the few tools we have in the GLAM wiki >>> community to measure impact and to make the case for sustaining our work. >>> >>> https://glamtools.toolforge.org/baglama2/ >>> >>> Without these basic metrics, 2023 could prove to be a disastrous >>> year for continuing efforts. So far, we have been unable to report good, >>> reliable numbers to folks such as the Metropolitan Museum of Art or the >>> Smithsonian Institution. Other on-demand tools such as Glamorgan usually >>> cannot handle such large category trees, and also have their own problems >>> with not being able to read the pageviews API numbers accurately, which is >>> another issue in itself. >>> >>> https://glamtools.toolforge.org/glamorgan.html >>> >>> In short - help! How can we get this on the radar screen of people >>> who can put more care, attention, and resources into this? Thanks. >>> >>> -Andrew >>> >>> -- >> *Fiona Romeo* (she/her) >> Senior Manager, Culture and Heritage >> Wikimedia Foundation https://wikimediafoundation.org/ >> >> >> _______________________________________________ Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Thank you Eric, Axel, Sandra F., and Wikimedia Sweden for leading in this area, even though we recognize it does fall out of the scope of the formal Content Partnerships Hub project.
A few thoughts ahead of the meeting:
1. As mentioned previously, I have tried to put down some aggregated thoughts and experiences in this document on meta which anyone is welcome to edit/add/share: https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
2. If nothing else, I encourage folks to add ideas (raw ideas are definitely fine) to the section "New approaches" https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
3. Please consider "manifesto" as sounding more scary or threatening than it should be - it was simply the word we had been talking about for a while, as the goal was: How do we explain the current GLAM wiki state of affairs to someone new to this space, whether it is a new WMF employee, an outsider who doesn't know about cultural or heritage partnerships?
Thanks all, -Andrew
On Tue, Feb 21, 2023 at 9:54 AM Eric Luth eric.luth@wikimedia.se wrote:
Hi all,
The meeting will take place at 16:00 UTC tomorrow on this link: https://us02web.zoom.us/j/81455808411 https://www.google.com/url?q=https://us02web.zoom.us/j/81455808411&sa=D&source=calendar&ust=1677422345273437&usg=AOvVaw2PGvR3NmIQDqmPoiv_h3mv André and I will join on behalf of Wikimedia Sverige and the Content Partnerships Hub initiative. Feel free to share the link with anyone you think might be good to bring to the call.
André and I will get back to you with an agenda as soon as possible, but the main part is of course to listen in and understand the situation, and discuss the priorities, as per Axel's previous email.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den mån 20 feb. 2023 kl 17:23 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi again, Thanks for all the replies to the Doodle, the winner is Wednesday February 22, 17.00-18.00 (GMT+1).
@Eric Luth eric.luth@wikimedia.se will send out a calendar invite with a meeting link and agenda tomorrow.
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tors 16 feb. 2023 kl 13:48 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi all, (Sent on behalf of the helpdesk.)
Andrew also sent a request to the Content Partnerships Hub helpdesk about this issue. We very much hear everyone’s concerns. Though the Helpdesk typically deals with content uploads, we do have another part of the hub initiative that is preparing for better (strategic) tools support in the upcoming year(s).[1]
Our current capacity is however very limited, and we are still not sure what funding we will receive for our future work. Also, we currently lack manpower and skills for this type of immediate fire-fighting, so if we were to work on this, it would be at the expense of other prioritized software development.
As a response to the Helpdesk request, we would therefore suggest setting up a meeting with all interested people on this thread, with the goal to share perspectives and to brainstorm an approach and capture your thoughts on priorities.
Please provide your availability in this Doodle: https://doodle.com/meeting/participate/id/dLZwmRWa (With excuses for being Europe/America friendly over other time zones.)
Please note that Sandra Fauconnier (who works as Product Strategist) will be absent from February 15 for at least a month (due to surgery + recovery period). During her absence, André Costa ( andre.costa@wikimedia.se) from WMSE will represent the Content Partnerships Hub on this topic.
[1] https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tis 14 feb. 2023 kl 06:38 skrev Andrew Lih andrew.lih@gmail.com:
Today, Wikimedia Cloud had an outage that highlights the fragile nature of our GLAM wiki ecosystem:
– All tools on wmcloud.org and toolforge.org were knocked out and unavailable for 4 hours.
https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... – Petscan needed an extra hour before it came back, because it is not setup to run automatically, and needs a manual restart by logging in as Magnus and running a script by hand. This is a problematic situation for service deployment. – Many tools rely on Petscan, such as GLAMorgan for expanding category trees and generating Mediawiki page titles, so this outage affected many more tools – BaGLAMa2 seems to have not come back successfully, as all the categories that should be tracked are missing. Likely, the data is all there somewhere, but it currently needs some loving care to be restored. Unclear if this is being worked on. – PAWS, the visual Python environment on wmcloud that is a workhorse for bot work and scripts, is still down and needs some loving care to revive. https://phabricator.wikimedia.org/T329581
In short – we're trying to be scrappy and resourceful, but we're hurting.
-Andrew
On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404 error)."
https://glamtools.toolforge.org/glamorgan.html
-Andrew
On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < celebration.women@gmail.com> wrote:
Thanks for posting the fabricator ticket; I too have subscribed. I concur with others, lack of support for reliable tools for GLAM institutions has been a major concern for GLAMs for many years. Mary Mark Ockerbloom
On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org wrote:
> Thanks for adding your perspective, Dominic. > > Here is the Phabricator ticket that tracks work the Foundation has > been doing with Wikimedia Israel to resolve storage issues for the GLAM > Wiki Dashboard: https://phabricator.wikimedia.org/T321702 > > The conclusion was that it would be best for the service to use the > MediaRequest API, as Dominic has also recommended in his email. Further to > this, the Foundation's Data Platform team is looking into a custom API > endpoint for media requests by category to reduce/remove the need for data > transformation and storage. As an interim solution for the GLAM > Wiki Dashboard, we advised Wikimedia Israel to migrate their project from > Amazon Web Services to our own servers and made capacity available for > that. > > We don't know as much about the BaGLAMa2 issues at the moment. > > I'm very sorry to see our GLAM wiki community struggling with tool > instability again. > > Fiona > > On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la > wrote: > >> For my part, I'd like to point out that these issues are recurring >> problems, and also that when it comes to BaGLAMa lag, the longer it goes, >> the more unrecoverable it becomes. Data errors, once introduced, are not >> repairable. >> >> Dozens of the tracked categories in BaGLAMa are DPLA institutions, >> and I have shared these links numerous times over the years. So I >> frequently get questions from partners who check their data and find it >> months out of date. There is nothing I can tell them in these situations, >> except that I have regularly seen data get that lagged, and then eventually >> it reaches a point where (presumably after someone finally reached Magnus?) >> all the backlogged months come in at once. >> >> This causes its own problems, I believe, because I have to assume >> in such situations where data is generated after the fact, that it is all >> corrupt to some degree. My understanding of BaGLAMa is that it counts page >> views of articles using images from a category. But there is no MediaWiki >> log of when images were added to a page (or to a category), so if you are >> counting page views that occurred three months ago based on images that are >> in a page today, you might be counting crediting three past months with >> views for an image that was added last week. >> >> This issue causes massive data errors in the other direction too. >> Sometimes you'll have an unexplained spike, like the several here >> https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and >> by spike, I mean 700 million page views), and it's caused by the fact that >> an image that was on the main page for no more than hours caused BaGLAMa to >> count the entire month's page views of the main page. These errors are >> unrecoverable; they stay in the data and just increase the error of the >> overall total over time. There's never been a time where I could go to a >> maintainer and point out this massive data error and get that rerun or >> fixed. Instead, I am often in the embarrassing position of telling partners >> "Here is the analytics page, but there is a big overcount on one random >> month, so just always remember to mentally subtract 100 million from your >> total, and treat these numbers as very inexact." >> >> So as long as we are talking about BaGLAMa at all, I do have to >> point out that it is an entirely flawed tool and the data is unreliable. >> And aside from all of those bugs, the methodology is very flawed, since it >> should not be using the Pageviews API in the first place. I consider the >> data essentially fictitious anyway— we know the images we are tracking are >> probably not even receiving half of the article views we are crediting to >> them, but we continue to report bad data, because our projects rely >> on having outcomes and reporting analytics. Glamorous and Glamorgan are >> based on the same flawed methodology. >> >> And I haven't even started on the clunky UI, where an ever-growing >> list of 1000+ categories are all displayed on the landing page, many of >> which are typos or non-existent categories that can never be removed or >> cleaned up. >> >> I guess my main point here is that no amount of band aids will ever >> resolve some of the issues, and we need to be thinking about entirely >> redoing the tool itself. Or we should have already done so as soon as the >> Mediarequests API was released—which was in 2019. >> >> Thanks! >> Dominic >> >> >> On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org >> wrote: >> >>> Dear Andrew, >>> >>> Thanks for escalating these specific issues to us. Giovanna and I >>> were both travelling in January so we haven't been as active in Telegram. >>> >>> Are you aware of anyone else having issues with the GLAM Wiki >>> Dashboard, or is it just The MET? I quickly sampled some of the >>> institutions and only saw a "bad request" for The MET. We have been >>> directly supporting Wikimedia Israel to optimise their service, so I will >>> raise this issue with both Wikimedia Israel and the Foundation team that >>> has some familiarity with their service. >>> >>> I noted these two BaGLAMa2 issues in the Telegram chat: >>> >>> https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d... >>> >>> >>> https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-... >>> >>> Are there other BaGLAMa2 reports we should be aware of? >>> >>> Metrics are definitely understood to be a priority for the >>> Foundation and I heard yesterday that metrics tools rose to the top in >>> Wikimedia Sweden’s survey too. There will be opportunities to discuss this >>> further in the context of annual planning but I will see what can be done >>> in the short term. >>> >>> More soon, >>> Fiona >>> >>> On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com >>> wrote: >>> >>>> Hi WREN and GLAM folks, >>>> >>>> I need your insights into what could be a very problematic year >>>> for us in the GLAM wiki community, as our metrics tools to measure our >>>> impact are in crisis and disrepair. If you have any insights, please do >>>> share them here, or in the GLAM Wiki Telegram group where this conversation >>>> started happening recently. >>>> >>>> I sent a "HELP!" message to the Wikimedia SE content partnerships >>>> help desk just the other day, included below, and hope this may be useful >>>> to start a conversation. If there is enough interest, we might want to >>>> start a wiki page to formally document our needs as a GLAM wiki community. >>>> Thanks. >>>> >>>> -Andrew >>>> >>>> ---- >>>> To: help@wikimedia.se >>>> >>>> I'd like to formally employ the Helpdesk's services in getting >>>> some care and attention to BaGLAMa2. It seems to have been failing since >>>> the end of last year, and even then, it was reporting extremely low figures >>>> for all categories. This is one of the few tools we have in the GLAM wiki >>>> community to measure impact and to make the case for sustaining our work. >>>> >>>> https://glamtools.toolforge.org/baglama2/ >>>> >>>> Without these basic metrics, 2023 could prove to be a disastrous >>>> year for continuing efforts. So far, we have been unable to report good, >>>> reliable numbers to folks such as the Metropolitan Museum of Art or the >>>> Smithsonian Institution. Other on-demand tools such as Glamorgan usually >>>> cannot handle such large category trees, and also have their own problems >>>> with not being able to read the pageviews API numbers accurately, which is >>>> another issue in itself. >>>> >>>> https://glamtools.toolforge.org/glamorgan.html >>>> >>>> In short - help! How can we get this on the radar screen of >>>> people who can put more care, attention, and resources into this? Thanks. >>>> >>>> -Andrew >>>> >>>> -- >>> *Fiona Romeo* (she/her) >>> Senior Manager, Culture and Heritage >>> Wikimedia Foundation https://wikimediafoundation.org/ >>> >>> >>> _______________________________________________ > Wren mailing list -- wren@lists.wikimedia.org > To unsubscribe send an email to wren-leave@lists.wikimedia.org > _______________________________________________ Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Hi again,
Here's a brief agenda:
Agenda
-
Participants describe their perspectives around the issue of tool support for GLAM and content partnerships work, and when this fails -
WMSE describes software and helpdesk tracks of the Hub -
Brainstorming approaches on how to deal with this and especially how to handle priorities
I also prepared an Etherpad for the discussion, here: https://etherpad.wikimedia.org/p/Problems_with_Wikimedia_metrics
*The meeting will take place on this link at 16.00 UTC: https://us02web.zoom.us/j/81455808411 https://us02web.zoom.us/j/81455808411*
See you soon,
*Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den tis 21 feb. 2023 kl 16:56 skrev Andrew Lih andrew.lih@gmail.com:
Thank you Eric, Axel, Sandra F., and Wikimedia Sweden for leading in this area, even though we recognize it does fall out of the scope of the formal Content Partnerships Hub project.
A few thoughts ahead of the meeting:
- As mentioned previously, I have tried to put down some aggregated
thoughts and experiences in this document on meta which anyone is welcome to edit/add/share:
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- If nothing else, I encourage folks to add ideas (raw ideas are
definitely fine) to the section "New approaches"
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- Please consider "manifesto" as sounding more scary or threatening than
it should be - it was simply the word we had been talking about for a while, as the goal was: How do we explain the current GLAM wiki state of affairs to someone new to this space, whether it is a new WMF employee, an outsider who doesn't know about cultural or heritage partnerships?
Thanks all, -Andrew
On Tue, Feb 21, 2023 at 9:54 AM Eric Luth eric.luth@wikimedia.se wrote:
Hi all,
The meeting will take place at 16:00 UTC tomorrow on this link: https://us02web.zoom.us/j/81455808411 https://www.google.com/url?q=https://us02web.zoom.us/j/81455808411&sa=D&source=calendar&ust=1677422345273437&usg=AOvVaw2PGvR3NmIQDqmPoiv_h3mv André and I will join on behalf of Wikimedia Sverige and the Content Partnerships Hub initiative. Feel free to share the link with anyone you think might be good to bring to the call.
André and I will get back to you with an agenda as soon as possible, but the main part is of course to listen in and understand the situation, and discuss the priorities, as per Axel's previous email.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den mån 20 feb. 2023 kl 17:23 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi again, Thanks for all the replies to the Doodle, the winner is Wednesday February 22, 17.00-18.00 (GMT+1).
@Eric Luth eric.luth@wikimedia.se will send out a calendar invite with a meeting link and agenda tomorrow.
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tors 16 feb. 2023 kl 13:48 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi all, (Sent on behalf of the helpdesk.)
Andrew also sent a request to the Content Partnerships Hub helpdesk about this issue. We very much hear everyone’s concerns. Though the Helpdesk typically deals with content uploads, we do have another part of the hub initiative that is preparing for better (strategic) tools support in the upcoming year(s).[1]
Our current capacity is however very limited, and we are still not sure what funding we will receive for our future work. Also, we currently lack manpower and skills for this type of immediate fire-fighting, so if we were to work on this, it would be at the expense of other prioritized software development.
As a response to the Helpdesk request, we would therefore suggest setting up a meeting with all interested people on this thread, with the goal to share perspectives and to brainstorm an approach and capture your thoughts on priorities.
Please provide your availability in this Doodle: https://doodle.com/meeting/participate/id/dLZwmRWa (With excuses for being Europe/America friendly over other time zones.)
Please note that Sandra Fauconnier (who works as Product Strategist) will be absent from February 15 for at least a month (due to surgery + recovery period). During her absence, André Costa ( andre.costa@wikimedia.se) from WMSE will represent the Content Partnerships Hub on this topic.
[1] https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tis 14 feb. 2023 kl 06:38 skrev Andrew Lih andrew.lih@gmail.com:
Today, Wikimedia Cloud had an outage that highlights the fragile nature of our GLAM wiki ecosystem:
– All tools on wmcloud.org and toolforge.org were knocked out and unavailable for 4 hours.
https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... – Petscan needed an extra hour before it came back, because it is not setup to run automatically, and needs a manual restart by logging in as Magnus and running a script by hand. This is a problematic situation for service deployment. – Many tools rely on Petscan, such as GLAMorgan for expanding category trees and generating Mediawiki page titles, so this outage affected many more tools – BaGLAMa2 seems to have not come back successfully, as all the categories that should be tracked are missing. Likely, the data is all there somewhere, but it currently needs some loving care to be restored. Unclear if this is being worked on. – PAWS, the visual Python environment on wmcloud that is a workhorse for bot work and scripts, is still down and needs some loving care to revive. https://phabricator.wikimedia.org/T329581
In short – we're trying to be scrappy and resourceful, but we're hurting.
-Andrew
On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404 error)."
https://glamtools.toolforge.org/glamorgan.html
-Andrew
On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < celebration.women@gmail.com> wrote:
> Thanks for posting the fabricator ticket; I too have subscribed. > I concur with others, lack of support for reliable tools for GLAM > institutions has been a major concern for GLAMs for many years. > Mary Mark Ockerbloom > > On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org > wrote: > >> Thanks for adding your perspective, Dominic. >> >> Here is the Phabricator ticket that tracks work the Foundation has >> been doing with Wikimedia Israel to resolve storage issues for the GLAM >> Wiki Dashboard: https://phabricator.wikimedia.org/T321702 >> >> The conclusion was that it would be best for the service to use the >> MediaRequest API, as Dominic has also recommended in his email. Further to >> this, the Foundation's Data Platform team is looking into a custom API >> endpoint for media requests by category to reduce/remove the need for data >> transformation and storage. As an interim solution for the GLAM >> Wiki Dashboard, we advised Wikimedia Israel to migrate their project from >> Amazon Web Services to our own servers and made capacity available for >> that. >> >> We don't know as much about the BaGLAMa2 issues at the moment. >> >> I'm very sorry to see our GLAM wiki community struggling with tool >> instability again. >> >> Fiona >> >> On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la >> wrote: >> >>> For my part, I'd like to point out that these issues are recurring >>> problems, and also that when it comes to BaGLAMa lag, the longer it goes, >>> the more unrecoverable it becomes. Data errors, once introduced, are not >>> repairable. >>> >>> Dozens of the tracked categories in BaGLAMa are DPLA institutions, >>> and I have shared these links numerous times over the years. So I >>> frequently get questions from partners who check their data and find it >>> months out of date. There is nothing I can tell them in these situations, >>> except that I have regularly seen data get that lagged, and then eventually >>> it reaches a point where (presumably after someone finally reached Magnus?) >>> all the backlogged months come in at once. >>> >>> This causes its own problems, I believe, because I have to assume >>> in such situations where data is generated after the fact, that it is all >>> corrupt to some degree. My understanding of BaGLAMa is that it counts page >>> views of articles using images from a category. But there is no MediaWiki >>> log of when images were added to a page (or to a category), so if you are >>> counting page views that occurred three months ago based on images that are >>> in a page today, you might be counting crediting three past months with >>> views for an image that was added last week. >>> >>> This issue causes massive data errors in the other direction too. >>> Sometimes you'll have an unexplained spike, like the several here >>> https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and >>> by spike, I mean 700 million page views), and it's caused by the fact that >>> an image that was on the main page for no more than hours caused BaGLAMa to >>> count the entire month's page views of the main page. These errors are >>> unrecoverable; they stay in the data and just increase the error of the >>> overall total over time. There's never been a time where I could go to a >>> maintainer and point out this massive data error and get that rerun or >>> fixed. Instead, I am often in the embarrassing position of telling partners >>> "Here is the analytics page, but there is a big overcount on one random >>> month, so just always remember to mentally subtract 100 million from your >>> total, and treat these numbers as very inexact." >>> >>> So as long as we are talking about BaGLAMa at all, I do have to >>> point out that it is an entirely flawed tool and the data is unreliable. >>> And aside from all of those bugs, the methodology is very flawed, since it >>> should not be using the Pageviews API in the first place. I consider the >>> data essentially fictitious anyway— we know the images we are tracking are >>> probably not even receiving half of the article views we are crediting to >>> them, but we continue to report bad data, because our projects rely >>> on having outcomes and reporting analytics. Glamorous and Glamorgan are >>> based on the same flawed methodology. >>> >>> And I haven't even started on the clunky UI, where an ever-growing >>> list of 1000+ categories are all displayed on the landing page, many of >>> which are typos or non-existent categories that can never be removed or >>> cleaned up. >>> >>> I guess my main point here is that no amount of band aids will >>> ever resolve some of the issues, and we need to be thinking about entirely >>> redoing the tool itself. Or we should have already done so as soon as the >>> Mediarequests API was released—which was in 2019. >>> >>> Thanks! >>> Dominic >>> >>> >>> On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org >>> wrote: >>> >>>> Dear Andrew, >>>> >>>> Thanks for escalating these specific issues to us. Giovanna and I >>>> were both travelling in January so we haven't been as active in Telegram. >>>> >>>> Are you aware of anyone else having issues with the GLAM Wiki >>>> Dashboard, or is it just The MET? I quickly sampled some of the >>>> institutions and only saw a "bad request" for The MET. We have been >>>> directly supporting Wikimedia Israel to optimise their service, so I will >>>> raise this issue with both Wikimedia Israel and the Foundation team that >>>> has some familiarity with their service. >>>> >>>> I noted these two BaGLAMa2 issues in the Telegram chat: >>>> >>>> https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d... >>>> >>>> >>>> https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-... >>>> >>>> Are there other BaGLAMa2 reports we should be aware of? >>>> >>>> Metrics are definitely understood to be a priority for the >>>> Foundation and I heard yesterday that metrics tools rose to the top in >>>> Wikimedia Sweden’s survey too. There will be opportunities to discuss this >>>> further in the context of annual planning but I will see what can be done >>>> in the short term. >>>> >>>> More soon, >>>> Fiona >>>> >>>> On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com >>>> wrote: >>>> >>>>> Hi WREN and GLAM folks, >>>>> >>>>> I need your insights into what could be a very problematic year >>>>> for us in the GLAM wiki community, as our metrics tools to measure our >>>>> impact are in crisis and disrepair. If you have any insights, please do >>>>> share them here, or in the GLAM Wiki Telegram group where this conversation >>>>> started happening recently. >>>>> >>>>> I sent a "HELP!" message to the Wikimedia SE content >>>>> partnerships help desk just the other day, included below, and hope this >>>>> may be useful to start a conversation. If there is enough interest, we >>>>> might want to start a wiki page to formally document our needs as a GLAM >>>>> wiki community. Thanks. >>>>> >>>>> -Andrew >>>>> >>>>> ---- >>>>> To: help@wikimedia.se >>>>> >>>>> I'd like to formally employ the Helpdesk's services in getting >>>>> some care and attention to BaGLAMa2. It seems to have been failing since >>>>> the end of last year, and even then, it was reporting extremely low figures >>>>> for all categories. This is one of the few tools we have in the GLAM wiki >>>>> community to measure impact and to make the case for sustaining our work. >>>>> >>>>> https://glamtools.toolforge.org/baglama2/ >>>>> >>>>> Without these basic metrics, 2023 could prove to be a disastrous >>>>> year for continuing efforts. So far, we have been unable to report good, >>>>> reliable numbers to folks such as the Metropolitan Museum of Art or the >>>>> Smithsonian Institution. Other on-demand tools such as Glamorgan usually >>>>> cannot handle such large category trees, and also have their own problems >>>>> with not being able to read the pageviews API numbers accurately, which is >>>>> another issue in itself. >>>>> >>>>> https://glamtools.toolforge.org/glamorgan.html >>>>> >>>>> In short - help! How can we get this on the radar screen of >>>>> people who can put more care, attention, and resources into this? Thanks. >>>>> >>>>> -Andrew >>>>> >>>>> -- >>>> *Fiona Romeo* (she/her) >>>> Senior Manager, Culture and Heritage >>>> Wikimedia Foundation https://wikimediafoundation.org/ >>>> >>>> >>>> _______________________________________________ >> Wren mailing list -- wren@lists.wikimedia.org >> To unsubscribe send an email to wren-leave@lists.wikimedia.org >> > _______________________________________________ > Wren mailing list -- wren@lists.wikimedia.org > To unsubscribe send an email to wren-leave@lists.wikimedia.org >
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Thanks Eric for convening this week's meeting about the content partnerships hub and Andrew for the draft manifesto.
For the more specific discussion about GLAM metrics needs, the product team has created this page for you to document your requirements: https://commons.wikimedia.org/wiki/Commons:Product_and_technical_support_for...
I appreciate that you have already shared your needs in many different channels, at different times, but this page will have the right audience at the right time.
Thanks for your patience and collaboration.
Fiona
On Tue, 21 Feb 2023 at 16:06, Andrew Lih andrew.lih@gmail.com wrote:
Thank you Eric, Axel, Sandra F., and Wikimedia Sweden for leading in this area, even though we recognize it does fall out of the scope of the formal Content Partnerships Hub project.
A few thoughts ahead of the meeting:
- As mentioned previously, I have tried to put down some aggregated
thoughts and experiences in this document on meta which anyone is welcome to edit/add/share:
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- If nothing else, I encourage folks to add ideas (raw ideas are
definitely fine) to the section "New approaches"
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- Please consider "manifesto" as sounding more scary or threatening than
it should be - it was simply the word we had been talking about for a while, as the goal was: How do we explain the current GLAM wiki state of affairs to someone new to this space, whether it is a new WMF employee, an outsider who doesn't know about cultural or heritage partnerships?
Thanks all, -Andrew
On Tue, Feb 21, 2023 at 9:54 AM Eric Luth eric.luth@wikimedia.se wrote:
Hi all,
The meeting will take place at 16:00 UTC tomorrow on this link: https://us02web.zoom.us/j/81455808411 https://www.google.com/url?q=https://us02web.zoom.us/j/81455808411&sa=D&source=calendar&ust=1677422345273437&usg=AOvVaw2PGvR3NmIQDqmPoiv_h3mv André and I will join on behalf of Wikimedia Sverige and the Content Partnerships Hub initiative. Feel free to share the link with anyone you think might be good to bring to the call.
André and I will get back to you with an agenda as soon as possible, but the main part is of course to listen in and understand the situation, and discuss the priorities, as per Axel's previous email.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den mån 20 feb. 2023 kl 17:23 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi again, Thanks for all the replies to the Doodle, the winner is Wednesday February 22, 17.00-18.00 (GMT+1).
@Eric Luth eric.luth@wikimedia.se will send out a calendar invite with a meeting link and agenda tomorrow.
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tors 16 feb. 2023 kl 13:48 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi all, (Sent on behalf of the helpdesk.)
Andrew also sent a request to the Content Partnerships Hub helpdesk about this issue. We very much hear everyone’s concerns. Though the Helpdesk typically deals with content uploads, we do have another part of the hub initiative that is preparing for better (strategic) tools support in the upcoming year(s).[1]
Our current capacity is however very limited, and we are still not sure what funding we will receive for our future work. Also, we currently lack manpower and skills for this type of immediate fire-fighting, so if we were to work on this, it would be at the expense of other prioritized software development.
As a response to the Helpdesk request, we would therefore suggest setting up a meeting with all interested people on this thread, with the goal to share perspectives and to brainstorm an approach and capture your thoughts on priorities.
Please provide your availability in this Doodle: https://doodle.com/meeting/participate/id/dLZwmRWa (With excuses for being Europe/America friendly over other time zones.)
Please note that Sandra Fauconnier (who works as Product Strategist) will be absent from February 15 for at least a month (due to surgery + recovery period). During her absence, André Costa ( andre.costa@wikimedia.se) from WMSE will represent the Content Partnerships Hub on this topic.
[1] https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tis 14 feb. 2023 kl 06:38 skrev Andrew Lih andrew.lih@gmail.com:
Today, Wikimedia Cloud had an outage that highlights the fragile nature of our GLAM wiki ecosystem:
– All tools on wmcloud.org and toolforge.org were knocked out and unavailable for 4 hours.
https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... – Petscan needed an extra hour before it came back, because it is not setup to run automatically, and needs a manual restart by logging in as Magnus and running a script by hand. This is a problematic situation for service deployment. – Many tools rely on Petscan, such as GLAMorgan for expanding category trees and generating Mediawiki page titles, so this outage affected many more tools – BaGLAMa2 seems to have not come back successfully, as all the categories that should be tracked are missing. Likely, the data is all there somewhere, but it currently needs some loving care to be restored. Unclear if this is being worked on. – PAWS, the visual Python environment on wmcloud that is a workhorse for bot work and scripts, is still down and needs some loving care to revive. https://phabricator.wikimedia.org/T329581
In short – we're trying to be scrappy and resourceful, but we're hurting.
-Andrew
On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com wrote:
Thanks all for the feedback and conversation.
In the meantime, has anyone gotten GLAMorgan to report back any useful pageview data?
Regardless of small, medium, or large categories, I keep getting: "Data for ... pages could not be loaded from the WMF pageview API (404 error)."
https://glamtools.toolforge.org/glamorgan.html
-Andrew
On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < celebration.women@gmail.com> wrote:
> Thanks for posting the fabricator ticket; I too have subscribed. > I concur with others, lack of support for reliable tools for GLAM > institutions has been a major concern for GLAMs for many years. > Mary Mark Ockerbloom > > On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org > wrote: > >> Thanks for adding your perspective, Dominic. >> >> Here is the Phabricator ticket that tracks work the Foundation has >> been doing with Wikimedia Israel to resolve storage issues for the GLAM >> Wiki Dashboard: https://phabricator.wikimedia.org/T321702 >> >> The conclusion was that it would be best for the service to use the >> MediaRequest API, as Dominic has also recommended in his email. Further to >> this, the Foundation's Data Platform team is looking into a custom API >> endpoint for media requests by category to reduce/remove the need for data >> transformation and storage. As an interim solution for the GLAM >> Wiki Dashboard, we advised Wikimedia Israel to migrate their project from >> Amazon Web Services to our own servers and made capacity available for >> that. >> >> We don't know as much about the BaGLAMa2 issues at the moment. >> >> I'm very sorry to see our GLAM wiki community struggling with tool >> instability again. >> >> Fiona >> >> On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la >> wrote: >> >>> For my part, I'd like to point out that these issues are recurring >>> problems, and also that when it comes to BaGLAMa lag, the longer it goes, >>> the more unrecoverable it becomes. Data errors, once introduced, are not >>> repairable. >>> >>> Dozens of the tracked categories in BaGLAMa are DPLA institutions, >>> and I have shared these links numerous times over the years. So I >>> frequently get questions from partners who check their data and find it >>> months out of date. There is nothing I can tell them in these situations, >>> except that I have regularly seen data get that lagged, and then eventually >>> it reaches a point where (presumably after someone finally reached Magnus?) >>> all the backlogged months come in at once. >>> >>> This causes its own problems, I believe, because I have to assume >>> in such situations where data is generated after the fact, that it is all >>> corrupt to some degree. My understanding of BaGLAMa is that it counts page >>> views of articles using images from a category. But there is no MediaWiki >>> log of when images were added to a page (or to a category), so if you are >>> counting page views that occurred three months ago based on images that are >>> in a page today, you might be counting crediting three past months with >>> views for an image that was added last week. >>> >>> This issue causes massive data errors in the other direction too. >>> Sometimes you'll have an unexplained spike, like the several here >>> https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and >>> by spike, I mean 700 million page views), and it's caused by the fact that >>> an image that was on the main page for no more than hours caused BaGLAMa to >>> count the entire month's page views of the main page. These errors are >>> unrecoverable; they stay in the data and just increase the error of the >>> overall total over time. There's never been a time where I could go to a >>> maintainer and point out this massive data error and get that rerun or >>> fixed. Instead, I am often in the embarrassing position of telling partners >>> "Here is the analytics page, but there is a big overcount on one random >>> month, so just always remember to mentally subtract 100 million from your >>> total, and treat these numbers as very inexact." >>> >>> So as long as we are talking about BaGLAMa at all, I do have to >>> point out that it is an entirely flawed tool and the data is unreliable. >>> And aside from all of those bugs, the methodology is very flawed, since it >>> should not be using the Pageviews API in the first place. I consider the >>> data essentially fictitious anyway— we know the images we are tracking are >>> probably not even receiving half of the article views we are crediting to >>> them, but we continue to report bad data, because our projects rely >>> on having outcomes and reporting analytics. Glamorous and Glamorgan are >>> based on the same flawed methodology. >>> >>> And I haven't even started on the clunky UI, where an ever-growing >>> list of 1000+ categories are all displayed on the landing page, many of >>> which are typos or non-existent categories that can never be removed or >>> cleaned up. >>> >>> I guess my main point here is that no amount of band aids will >>> ever resolve some of the issues, and we need to be thinking about entirely >>> redoing the tool itself. Or we should have already done so as soon as the >>> Mediarequests API was released—which was in 2019. >>> >>> Thanks! >>> Dominic >>> >>> >>> On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org >>> wrote: >>> >>>> Dear Andrew, >>>> >>>> Thanks for escalating these specific issues to us. Giovanna and I >>>> were both travelling in January so we haven't been as active in Telegram. >>>> >>>> Are you aware of anyone else having issues with the GLAM Wiki >>>> Dashboard, or is it just The MET? I quickly sampled some of the >>>> institutions and only saw a "bad request" for The MET. We have been >>>> directly supporting Wikimedia Israel to optimise their service, so I will >>>> raise this issue with both Wikimedia Israel and the Foundation team that >>>> has some familiarity with their service. >>>> >>>> I noted these two BaGLAMa2 issues in the Telegram chat: >>>> >>>> https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d... >>>> >>>> >>>> https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-... >>>> >>>> Are there other BaGLAMa2 reports we should be aware of? >>>> >>>> Metrics are definitely understood to be a priority for the >>>> Foundation and I heard yesterday that metrics tools rose to the top in >>>> Wikimedia Sweden’s survey too. There will be opportunities to discuss this >>>> further in the context of annual planning but I will see what can be done >>>> in the short term. >>>> >>>> More soon, >>>> Fiona >>>> >>>> On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com >>>> wrote: >>>> >>>>> Hi WREN and GLAM folks, >>>>> >>>>> I need your insights into what could be a very problematic year >>>>> for us in the GLAM wiki community, as our metrics tools to measure our >>>>> impact are in crisis and disrepair. If you have any insights, please do >>>>> share them here, or in the GLAM Wiki Telegram group where this conversation >>>>> started happening recently. >>>>> >>>>> I sent a "HELP!" message to the Wikimedia SE content >>>>> partnerships help desk just the other day, included below, and hope this >>>>> may be useful to start a conversation. If there is enough interest, we >>>>> might want to start a wiki page to formally document our needs as a GLAM >>>>> wiki community. Thanks. >>>>> >>>>> -Andrew >>>>> >>>>> ---- >>>>> To: help@wikimedia.se >>>>> >>>>> I'd like to formally employ the Helpdesk's services in getting >>>>> some care and attention to BaGLAMa2. It seems to have been failing since >>>>> the end of last year, and even then, it was reporting extremely low figures >>>>> for all categories. This is one of the few tools we have in the GLAM wiki >>>>> community to measure impact and to make the case for sustaining our work. >>>>> >>>>> https://glamtools.toolforge.org/baglama2/ >>>>> >>>>> Without these basic metrics, 2023 could prove to be a disastrous >>>>> year for continuing efforts. So far, we have been unable to report good, >>>>> reliable numbers to folks such as the Metropolitan Museum of Art or the >>>>> Smithsonian Institution. Other on-demand tools such as Glamorgan usually >>>>> cannot handle such large category trees, and also have their own problems >>>>> with not being able to read the pageviews API numbers accurately, which is >>>>> another issue in itself. >>>>> >>>>> https://glamtools.toolforge.org/glamorgan.html >>>>> >>>>> In short - help! How can we get this on the radar screen of >>>>> people who can put more care, attention, and resources into this? Thanks. >>>>> >>>>> -Andrew >>>>> >>>>> -- >>>> *Fiona Romeo* (she/her) >>>> Senior Manager, Culture and Heritage >>>> Wikimedia Foundation https://wikimediafoundation.org/ >>>> >>>> >>>> _______________________________________________ >> Wren mailing list -- wren@lists.wikimedia.org >> To unsubscribe send an email to wren-leave@lists.wikimedia.org >> > _______________________________________________ > Wren mailing list -- wren@lists.wikimedia.org > To unsubscribe send an email to wren-leave@lists.wikimedia.org >
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Hi all,
Thanks for a great call last week!
André and I have tried to gather the input during the meeting and from the chat into one structured document, that we would like to add to a Meta page. It would be great to know if you think we capture the conversation well: https://docs.google.com/document/d/1_tKPkUzAlaCpOuyStygkKwTQDjlSkAjnYHkp7Pr5...
Also, we were considering linking to the Etherpad document from the Meta page, to save the clever insights from the conversation. But as it includes names – please tell us if you don't want us to add a link to the Etherpad document.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den fre 24 feb. 2023 kl 12:07 skrev Fiona Romeo fromeo@wikimedia.org:
Thanks Eric for convening this week's meeting about the content partnerships hub and Andrew for the draft manifesto.
For the more specific discussion about GLAM metrics needs, the product team has created this page for you to document your requirements:
https://commons.wikimedia.org/wiki/Commons:Product_and_technical_support_for...
I appreciate that you have already shared your needs in many different channels, at different times, but this page will have the right audience at the right time.
Thanks for your patience and collaboration.
Fiona
On Tue, 21 Feb 2023 at 16:06, Andrew Lih andrew.lih@gmail.com wrote:
Thank you Eric, Axel, Sandra F., and Wikimedia Sweden for leading in this area, even though we recognize it does fall out of the scope of the formal Content Partnerships Hub project.
A few thoughts ahead of the meeting:
- As mentioned previously, I have tried to put down some aggregated
thoughts and experiences in this document on meta which anyone is welcome to edit/add/share:
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- If nothing else, I encourage folks to add ideas (raw ideas are
definitely fine) to the section "New approaches"
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- Please consider "manifesto" as sounding more scary or threatening than
it should be - it was simply the word we had been talking about for a while, as the goal was: How do we explain the current GLAM wiki state of affairs to someone new to this space, whether it is a new WMF employee, an outsider who doesn't know about cultural or heritage partnerships?
Thanks all, -Andrew
On Tue, Feb 21, 2023 at 9:54 AM Eric Luth eric.luth@wikimedia.se wrote:
Hi all,
The meeting will take place at 16:00 UTC tomorrow on this link: https://us02web.zoom.us/j/81455808411 https://www.google.com/url?q=https://us02web.zoom.us/j/81455808411&sa=D&source=calendar&ust=1677422345273437&usg=AOvVaw2PGvR3NmIQDqmPoiv_h3mv André and I will join on behalf of Wikimedia Sverige and the Content Partnerships Hub initiative. Feel free to share the link with anyone you think might be good to bring to the call.
André and I will get back to you with an agenda as soon as possible, but the main part is of course to listen in and understand the situation, and discuss the priorities, as per Axel's previous email.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den mån 20 feb. 2023 kl 17:23 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi again, Thanks for all the replies to the Doodle, the winner is Wednesday February 22, 17.00-18.00 (GMT+1).
@Eric Luth eric.luth@wikimedia.se will send out a calendar invite with a meeting link and agenda tomorrow.
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tors 16 feb. 2023 kl 13:48 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi all, (Sent on behalf of the helpdesk.)
Andrew also sent a request to the Content Partnerships Hub helpdesk about this issue. We very much hear everyone’s concerns. Though the Helpdesk typically deals with content uploads, we do have another part of the hub initiative that is preparing for better (strategic) tools support in the upcoming year(s).[1]
Our current capacity is however very limited, and we are still not sure what funding we will receive for our future work. Also, we currently lack manpower and skills for this type of immediate fire-fighting, so if we were to work on this, it would be at the expense of other prioritized software development.
As a response to the Helpdesk request, we would therefore suggest setting up a meeting with all interested people on this thread, with the goal to share perspectives and to brainstorm an approach and capture your thoughts on priorities.
Please provide your availability in this Doodle: https://doodle.com/meeting/participate/id/dLZwmRWa (With excuses for being Europe/America friendly over other time zones.)
Please note that Sandra Fauconnier (who works as Product Strategist) will be absent from February 15 for at least a month (due to surgery + recovery period). During her absence, André Costa ( andre.costa@wikimedia.se) from WMSE will represent the Content Partnerships Hub on this topic.
[1] https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tis 14 feb. 2023 kl 06:38 skrev Andrew Lih andrew.lih@gmail.com:
Today, Wikimedia Cloud had an outage that highlights the fragile nature of our GLAM wiki ecosystem:
– All tools on wmcloud.org and toolforge.org were knocked out and unavailable for 4 hours.
https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... – Petscan needed an extra hour before it came back, because it is not setup to run automatically, and needs a manual restart by logging in as Magnus and running a script by hand. This is a problematic situation for service deployment. – Many tools rely on Petscan, such as GLAMorgan for expanding category trees and generating Mediawiki page titles, so this outage affected many more tools – BaGLAMa2 seems to have not come back successfully, as all the categories that should be tracked are missing. Likely, the data is all there somewhere, but it currently needs some loving care to be restored. Unclear if this is being worked on. – PAWS, the visual Python environment on wmcloud that is a workhorse for bot work and scripts, is still down and needs some loving care to revive. https://phabricator.wikimedia.org/T329581
In short – we're trying to be scrappy and resourceful, but we're hurting.
-Andrew
On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com wrote:
> Thanks all for the feedback and conversation. > > In the meantime, has anyone gotten GLAMorgan to report back any > useful pageview data? > > Regardless of small, medium, or large categories, I keep getting: > "Data for ... pages could not be loaded from the WMF pageview API > (404 error)." > > https://glamtools.toolforge.org/glamorgan.html > > -Andrew > > > On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < > celebration.women@gmail.com> wrote: > >> Thanks for posting the fabricator ticket; I too have subscribed. >> I concur with others, lack of support for reliable tools for GLAM >> institutions has been a major concern for GLAMs for many years. >> Mary Mark Ockerbloom >> >> On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org >> wrote: >> >>> Thanks for adding your perspective, Dominic. >>> >>> Here is the Phabricator ticket that tracks work the Foundation has >>> been doing with Wikimedia Israel to resolve storage issues for the GLAM >>> Wiki Dashboard: https://phabricator.wikimedia.org/T321702 >>> >>> The conclusion was that it would be best for the service to use >>> the MediaRequest API, as Dominic has also recommended in his email. Further >>> to this, the Foundation's Data Platform team is looking into a custom API >>> endpoint for media requests by category to reduce/remove the need for data >>> transformation and storage. As an interim solution for the GLAM >>> Wiki Dashboard, we advised Wikimedia Israel to migrate their project from >>> Amazon Web Services to our own servers and made capacity available for >>> that. >>> >>> We don't know as much about the BaGLAMa2 issues at the moment. >>> >>> I'm very sorry to see our GLAM wiki community struggling with tool >>> instability again. >>> >>> Fiona >>> >>> On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la >>> wrote: >>> >>>> For my part, I'd like to point out that these issues are >>>> recurring problems, and also that when it comes to BaGLAMa lag, the longer >>>> it goes, the more unrecoverable it becomes. Data errors, once introduced, >>>> are not repairable. >>>> >>>> Dozens of the tracked categories in BaGLAMa are DPLA >>>> institutions, and I have shared these links numerous times over the years. >>>> So I frequently get questions from partners who check their data and find >>>> it months out of date. There is nothing I can tell them in these >>>> situations, except that I have regularly seen data get that lagged, and >>>> then eventually it reaches a point where (presumably after someone finally >>>> reached Magnus?) all the backlogged months come in at once. >>>> >>>> This causes its own problems, I believe, because I have to assume >>>> in such situations where data is generated after the fact, that it is all >>>> corrupt to some degree. My understanding of BaGLAMa is that it counts page >>>> views of articles using images from a category. But there is no MediaWiki >>>> log of when images were added to a page (or to a category), so if you are >>>> counting page views that occurred three months ago based on images that are >>>> in a page today, you might be counting crediting three past months with >>>> views for an image that was added last week. >>>> >>>> This issue causes massive data errors in the other direction too. >>>> Sometimes you'll have an unexplained spike, like the several here >>>> https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and >>>> by spike, I mean 700 million page views), and it's caused by the fact that >>>> an image that was on the main page for no more than hours caused BaGLAMa to >>>> count the entire month's page views of the main page. These errors are >>>> unrecoverable; they stay in the data and just increase the error of the >>>> overall total over time. There's never been a time where I could go to a >>>> maintainer and point out this massive data error and get that rerun or >>>> fixed. Instead, I am often in the embarrassing position of telling partners >>>> "Here is the analytics page, but there is a big overcount on one random >>>> month, so just always remember to mentally subtract 100 million from your >>>> total, and treat these numbers as very inexact." >>>> >>>> So as long as we are talking about BaGLAMa at all, I do have to >>>> point out that it is an entirely flawed tool and the data is unreliable. >>>> And aside from all of those bugs, the methodology is very flawed, since it >>>> should not be using the Pageviews API in the first place. I consider the >>>> data essentially fictitious anyway— we know the images we are tracking are >>>> probably not even receiving half of the article views we are crediting to >>>> them, but we continue to report bad data, because our projects rely >>>> on having outcomes and reporting analytics. Glamorous and Glamorgan are >>>> based on the same flawed methodology. >>>> >>>> And I haven't even started on the clunky UI, where an >>>> ever-growing list of 1000+ categories are all displayed on the landing >>>> page, many of which are typos or non-existent categories that can never be >>>> removed or cleaned up. >>>> >>>> I guess my main point here is that no amount of band aids will >>>> ever resolve some of the issues, and we need to be thinking about entirely >>>> redoing the tool itself. Or we should have already done so as soon as the >>>> Mediarequests API was released—which was in 2019. >>>> >>>> Thanks! >>>> Dominic >>>> >>>> >>>> On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org >>>> wrote: >>>> >>>>> Dear Andrew, >>>>> >>>>> Thanks for escalating these specific issues to us. Giovanna and >>>>> I were both travelling in January so we haven't been as active in Telegram. >>>>> >>>>> Are you aware of anyone else having issues with the GLAM Wiki >>>>> Dashboard, or is it just The MET? I quickly sampled some of the >>>>> institutions and only saw a "bad request" for The MET. We have been >>>>> directly supporting Wikimedia Israel to optimise their service, so I will >>>>> raise this issue with both Wikimedia Israel and the Foundation team that >>>>> has some familiarity with their service. >>>>> >>>>> I noted these two BaGLAMa2 issues in the Telegram chat: >>>>> >>>>> https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d... >>>>> >>>>> >>>>> https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-... >>>>> >>>>> Are there other BaGLAMa2 reports we should be aware of? >>>>> >>>>> Metrics are definitely understood to be a priority for the >>>>> Foundation and I heard yesterday that metrics tools rose to the top in >>>>> Wikimedia Sweden’s survey too. There will be opportunities to discuss this >>>>> further in the context of annual planning but I will see what can be done >>>>> in the short term. >>>>> >>>>> More soon, >>>>> Fiona >>>>> >>>>> On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com >>>>> wrote: >>>>> >>>>>> Hi WREN and GLAM folks, >>>>>> >>>>>> I need your insights into what could be a very problematic year >>>>>> for us in the GLAM wiki community, as our metrics tools to measure our >>>>>> impact are in crisis and disrepair. If you have any insights, please do >>>>>> share them here, or in the GLAM Wiki Telegram group where this conversation >>>>>> started happening recently. >>>>>> >>>>>> I sent a "HELP!" message to the Wikimedia SE content >>>>>> partnerships help desk just the other day, included below, and hope this >>>>>> may be useful to start a conversation. If there is enough interest, we >>>>>> might want to start a wiki page to formally document our needs as a GLAM >>>>>> wiki community. Thanks. >>>>>> >>>>>> -Andrew >>>>>> >>>>>> ---- >>>>>> To: help@wikimedia.se >>>>>> >>>>>> I'd like to formally employ the Helpdesk's services in getting >>>>>> some care and attention to BaGLAMa2. It seems to have been failing since >>>>>> the end of last year, and even then, it was reporting extremely low figures >>>>>> for all categories. This is one of the few tools we have in the GLAM wiki >>>>>> community to measure impact and to make the case for sustaining our work. >>>>>> >>>>>> https://glamtools.toolforge.org/baglama2/ >>>>>> >>>>>> Without these basic metrics, 2023 could prove to be a >>>>>> disastrous year for continuing efforts. So far, we have been unable to >>>>>> report good, reliable numbers to folks such as the Metropolitan Museum of >>>>>> Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan >>>>>> usually cannot handle such large category trees, and also have their own >>>>>> problems with not being able to read the pageviews API numbers accurately, >>>>>> which is another issue in itself. >>>>>> >>>>>> https://glamtools.toolforge.org/glamorgan.html >>>>>> >>>>>> In short - help! How can we get this on the radar screen of >>>>>> people who can put more care, attention, and resources into this? Thanks. >>>>>> >>>>>> -Andrew >>>>>> >>>>>> -- >>>>> *Fiona Romeo* (she/her) >>>>> Senior Manager, Culture and Heritage >>>>> Wikimedia Foundation https://wikimediafoundation.org/ >>>>> >>>>> >>>>> _______________________________________________ >>> Wren mailing list -- wren@lists.wikimedia.org >>> To unsubscribe send an email to wren-leave@lists.wikimedia.org >>> >> _______________________________________________ >> Wren mailing list -- wren@lists.wikimedia.org >> To unsubscribe send an email to wren-leave@lists.wikimedia.org >> > > > -- > -Andrew Lih > Smithsonian Institution - Wikimedian at Large > Metropolitan Museum of Art - Wikimedia strategist > Previously: professor of journalism and communications, American > University, Columbia University, University of Southern California > --- > Email: andrew.lih@gmail.com, andrew@andrewlih.com > WEB: https://muckrack.com/fuzheado > PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE > >
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Thank you, Eric and André. This is a very useful synthesis of our conversation. I have added comments, and I see Andrew has also added comments.
What do you think the next steps should be?
Best,
João
Em seg., 27 de fev. de 2023 às 15:04, Eric Luth eric.luth@wikimedia.se escreveu:
Hi all,
Thanks for a great call last week!
André and I have tried to gather the input during the meeting and from the chat into one structured document, that we would like to add to a Meta page. It would be great to know if you think we capture the conversation well: https://docs.google.com/document/d/1_tKPkUzAlaCpOuyStygkKwTQDjlSkAjnYHkp7Pr5...
Also, we were considering linking to the Etherpad document from the Meta page, to save the clever insights from the conversation. But as it includes names – please tell us if you don't want us to add a link to the Etherpad document.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den fre 24 feb. 2023 kl 12:07 skrev Fiona Romeo fromeo@wikimedia.org:
Thanks Eric for convening this week's meeting about the content partnerships hub and Andrew for the draft manifesto.
For the more specific discussion about GLAM metrics needs, the product team has created this page for you to document your requirements:
https://commons.wikimedia.org/wiki/Commons:Product_and_technical_support_for...
I appreciate that you have already shared your needs in many different channels, at different times, but this page will have the right audience at the right time.
Thanks for your patience and collaboration.
Fiona
On Tue, 21 Feb 2023 at 16:06, Andrew Lih andrew.lih@gmail.com wrote:
Thank you Eric, Axel, Sandra F., and Wikimedia Sweden for leading in this area, even though we recognize it does fall out of the scope of the formal Content Partnerships Hub project.
A few thoughts ahead of the meeting:
- As mentioned previously, I have tried to put down some aggregated
thoughts and experiences in this document on meta which anyone is welcome to edit/add/share:
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- If nothing else, I encourage folks to add ideas (raw ideas are
definitely fine) to the section "New approaches"
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- Please consider "manifesto" as sounding more scary or threatening
than it should be - it was simply the word we had been talking about for a while, as the goal was: How do we explain the current GLAM wiki state of affairs to someone new to this space, whether it is a new WMF employee, an outsider who doesn't know about cultural or heritage partnerships?
Thanks all, -Andrew
On Tue, Feb 21, 2023 at 9:54 AM Eric Luth eric.luth@wikimedia.se wrote:
Hi all,
The meeting will take place at 16:00 UTC tomorrow on this link: https://us02web.zoom.us/j/81455808411 https://www.google.com/url?q=https://us02web.zoom.us/j/81455808411&sa=D&source=calendar&ust=1677422345273437&usg=AOvVaw2PGvR3NmIQDqmPoiv_h3mv André and I will join on behalf of Wikimedia Sverige and the Content Partnerships Hub initiative. Feel free to share the link with anyone you think might be good to bring to the call.
André and I will get back to you with an agenda as soon as possible, but the main part is of course to listen in and understand the situation, and discuss the priorities, as per Axel's previous email.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den mån 20 feb. 2023 kl 17:23 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi again, Thanks for all the replies to the Doodle, the winner is Wednesday February 22, 17.00-18.00 (GMT+1).
@Eric Luth eric.luth@wikimedia.se will send out a calendar invite with a meeting link and agenda tomorrow.
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tors 16 feb. 2023 kl 13:48 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi all, (Sent on behalf of the helpdesk.)
Andrew also sent a request to the Content Partnerships Hub helpdesk about this issue. We very much hear everyone’s concerns. Though the Helpdesk typically deals with content uploads, we do have another part of the hub initiative that is preparing for better (strategic) tools support in the upcoming year(s).[1]
Our current capacity is however very limited, and we are still not sure what funding we will receive for our future work. Also, we currently lack manpower and skills for this type of immediate fire-fighting, so if we were to work on this, it would be at the expense of other prioritized software development.
As a response to the Helpdesk request, we would therefore suggest setting up a meeting with all interested people on this thread, with the goal to share perspectives and to brainstorm an approach and capture your thoughts on priorities.
Please provide your availability in this Doodle: https://doodle.com/meeting/participate/id/dLZwmRWa (With excuses for being Europe/America friendly over other time zones.)
Please note that Sandra Fauconnier (who works as Product Strategist) will be absent from February 15 for at least a month (due to surgery + recovery period). During her absence, André Costa ( andre.costa@wikimedia.se) from WMSE will represent the Content Partnerships Hub on this topic.
[1] https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tis 14 feb. 2023 kl 06:38 skrev Andrew Lih <andrew.lih@gmail.com >:
> Today, Wikimedia Cloud had an outage that highlights the fragile > nature of our GLAM wiki ecosystem: > > – All tools on wmcloud.org and toolforge.org were knocked out and > unavailable for 4 hours. > > https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... > – Petscan needed an extra hour before it came back, because it is > not setup to run automatically, and needs a manual restart by logging in as > Magnus and running a script by hand. This is a problematic situation for > service deployment. > – Many tools rely on Petscan, such as GLAMorgan for expanding > category trees and generating Mediawiki page titles, so this outage > affected many more tools > – BaGLAMa2 seems to have not come back successfully, as all the > categories that should be tracked are missing. Likely, the data is all > there somewhere, but it currently needs some loving care to be restored. > Unclear if this is being worked on. > – PAWS, the visual Python environment on wmcloud that is a workhorse > for bot work and scripts, is still down and needs some loving care to > revive. https://phabricator.wikimedia.org/T329581 > > In short – we're trying to be scrappy and resourceful, but we're > hurting. > > -Andrew > > > On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com > wrote: > >> Thanks all for the feedback and conversation. >> >> In the meantime, has anyone gotten GLAMorgan to report back any >> useful pageview data? >> >> Regardless of small, medium, or large categories, I keep getting: >> "Data for ... pages could not be loaded from the WMF pageview API >> (404 error)." >> >> https://glamtools.toolforge.org/glamorgan.html >> >> -Andrew >> >> >> On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < >> celebration.women@gmail.com> wrote: >> >>> Thanks for posting the fabricator ticket; I too have subscribed. >>> I concur with others, lack of support for reliable tools for GLAM >>> institutions has been a major concern for GLAMs for many years. >>> Mary Mark Ockerbloom >>> >>> On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org >>> wrote: >>> >>>> Thanks for adding your perspective, Dominic. >>>> >>>> Here is the Phabricator ticket that tracks work the Foundation >>>> has been doing with Wikimedia Israel to resolve storage issues for the GLAM >>>> Wiki Dashboard: https://phabricator.wikimedia.org/T321702 >>>> >>>> The conclusion was that it would be best for the service to use >>>> the MediaRequest API, as Dominic has also recommended in his email. Further >>>> to this, the Foundation's Data Platform team is looking into a custom API >>>> endpoint for media requests by category to reduce/remove the need for data >>>> transformation and storage. As an interim solution for the GLAM >>>> Wiki Dashboard, we advised Wikimedia Israel to migrate their project from >>>> Amazon Web Services to our own servers and made capacity available for >>>> that. >>>> >>>> We don't know as much about the BaGLAMa2 issues at the moment. >>>> >>>> I'm very sorry to see our GLAM wiki community struggling with >>>> tool instability again. >>>> >>>> Fiona >>>> >>>> On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt dominic@dp.la >>>> wrote: >>>> >>>>> For my part, I'd like to point out that these issues are >>>>> recurring problems, and also that when it comes to BaGLAMa lag, the longer >>>>> it goes, the more unrecoverable it becomes. Data errors, once introduced, >>>>> are not repairable. >>>>> >>>>> Dozens of the tracked categories in BaGLAMa are DPLA >>>>> institutions, and I have shared these links numerous times over the years. >>>>> So I frequently get questions from partners who check their data and find >>>>> it months out of date. There is nothing I can tell them in these >>>>> situations, except that I have regularly seen data get that lagged, and >>>>> then eventually it reaches a point where (presumably after someone finally >>>>> reached Magnus?) all the backlogged months come in at once. >>>>> >>>>> This causes its own problems, I believe, because I have to >>>>> assume in such situations where data is generated after the fact, that it >>>>> is all corrupt to some degree. My understanding of BaGLAMa is that >>>>> it counts page views of articles using images from a category. But there is >>>>> no MediaWiki log of when images were added to a page (or to a category), so >>>>> if you are counting page views that occurred three months ago based on >>>>> images that are in a page today, you might be counting crediting three past >>>>> months with views for an image that was added last week. >>>>> >>>>> This issue causes massive data errors in the other direction >>>>> too. Sometimes you'll have an unexplained spike, like the several >>>>> here >>>>> https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and >>>>> by spike, I mean 700 million page views), and it's caused by the fact that >>>>> an image that was on the main page for no more than hours caused BaGLAMa to >>>>> count the entire month's page views of the main page. These errors are >>>>> unrecoverable; they stay in the data and just increase the error of the >>>>> overall total over time. There's never been a time where I could go to a >>>>> maintainer and point out this massive data error and get that rerun or >>>>> fixed. Instead, I am often in the embarrassing position of telling partners >>>>> "Here is the analytics page, but there is a big overcount on one random >>>>> month, so just always remember to mentally subtract 100 million from your >>>>> total, and treat these numbers as very inexact." >>>>> >>>>> So as long as we are talking about BaGLAMa at all, I do have to >>>>> point out that it is an entirely flawed tool and the data is unreliable. >>>>> And aside from all of those bugs, the methodology is very flawed, since it >>>>> should not be using the Pageviews API in the first place. I consider the >>>>> data essentially fictitious anyway— we know the images we are tracking are >>>>> probably not even receiving half of the article views we are crediting to >>>>> them, but we continue to report bad data, because our projects rely >>>>> on having outcomes and reporting analytics. Glamorous and Glamorgan are >>>>> based on the same flawed methodology. >>>>> >>>>> And I haven't even started on the clunky UI, where an >>>>> ever-growing list of 1000+ categories are all displayed on the landing >>>>> page, many of which are typos or non-existent categories that can never be >>>>> removed or cleaned up. >>>>> >>>>> I guess my main point here is that no amount of band aids will >>>>> ever resolve some of the issues, and we need to be thinking about entirely >>>>> redoing the tool itself. Or we should have already done so as soon as the >>>>> Mediarequests API was released—which was in 2019. >>>>> >>>>> Thanks! >>>>> Dominic >>>>> >>>>> >>>>> On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo fromeo@wikimedia.org >>>>> wrote: >>>>> >>>>>> Dear Andrew, >>>>>> >>>>>> Thanks for escalating these specific issues to us. Giovanna and >>>>>> I were both travelling in January so we haven't been as active in Telegram. >>>>>> >>>>>> Are you aware of anyone else having issues with the GLAM Wiki >>>>>> Dashboard, or is it just The MET? I quickly sampled some of the >>>>>> institutions and only saw a "bad request" for The MET. We have been >>>>>> directly supporting Wikimedia Israel to optimise their service, so I will >>>>>> raise this issue with both Wikimedia Israel and the Foundation team that >>>>>> has some familiarity with their service. >>>>>> >>>>>> I noted these two BaGLAMa2 issues in the Telegram chat: >>>>>> >>>>>> https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d... >>>>>> >>>>>> >>>>>> https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-... >>>>>> >>>>>> Are there other BaGLAMa2 reports we should be aware of? >>>>>> >>>>>> Metrics are definitely understood to be a priority for the >>>>>> Foundation and I heard yesterday that metrics tools rose to the top in >>>>>> Wikimedia Sweden’s survey too. There will be opportunities to discuss this >>>>>> further in the context of annual planning but I will see what can be done >>>>>> in the short term. >>>>>> >>>>>> More soon, >>>>>> Fiona >>>>>> >>>>>> On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com >>>>>> wrote: >>>>>> >>>>>>> Hi WREN and GLAM folks, >>>>>>> >>>>>>> I need your insights into what could be a very problematic >>>>>>> year for us in the GLAM wiki community, as our metrics tools to measure our >>>>>>> impact are in crisis and disrepair. If you have any insights, please do >>>>>>> share them here, or in the GLAM Wiki Telegram group where this conversation >>>>>>> started happening recently. >>>>>>> >>>>>>> I sent a "HELP!" message to the Wikimedia SE content >>>>>>> partnerships help desk just the other day, included below, and hope this >>>>>>> may be useful to start a conversation. If there is enough interest, we >>>>>>> might want to start a wiki page to formally document our needs as a GLAM >>>>>>> wiki community. Thanks. >>>>>>> >>>>>>> -Andrew >>>>>>> >>>>>>> ---- >>>>>>> To: help@wikimedia.se >>>>>>> >>>>>>> I'd like to formally employ the Helpdesk's services in getting >>>>>>> some care and attention to BaGLAMa2. It seems to have been failing since >>>>>>> the end of last year, and even then, it was reporting extremely low figures >>>>>>> for all categories. This is one of the few tools we have in the GLAM wiki >>>>>>> community to measure impact and to make the case for sustaining our work. >>>>>>> >>>>>>> https://glamtools.toolforge.org/baglama2/ >>>>>>> >>>>>>> Without these basic metrics, 2023 could prove to be a >>>>>>> disastrous year for continuing efforts. So far, we have been unable to >>>>>>> report good, reliable numbers to folks such as the Metropolitan Museum of >>>>>>> Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan >>>>>>> usually cannot handle such large category trees, and also have their own >>>>>>> problems with not being able to read the pageviews API numbers accurately, >>>>>>> which is another issue in itself. >>>>>>> >>>>>>> https://glamtools.toolforge.org/glamorgan.html >>>>>>> >>>>>>> In short - help! How can we get this on the radar screen of >>>>>>> people who can put more care, attention, and resources into this? Thanks. >>>>>>> >>>>>>> -Andrew >>>>>>> >>>>>>> -- >>>>>> *Fiona Romeo* (she/her) >>>>>> Senior Manager, Culture and Heritage >>>>>> Wikimedia Foundation https://wikimediafoundation.org/ >>>>>> >>>>>> >>>>>> _______________________________________________ >>>> Wren mailing list -- wren@lists.wikimedia.org >>>> To unsubscribe send an email to wren-leave@lists.wikimedia.org >>>> >>> _______________________________________________ >>> Wren mailing list -- wren@lists.wikimedia.org >>> To unsubscribe send an email to wren-leave@lists.wikimedia.org >>> >> >> >> -- >> -Andrew Lih >> Smithsonian Institution - Wikimedian at Large >> Metropolitan Museum of Art - Wikimedia strategist >> Previously: professor of journalism and communications, American >> University, Columbia University, University of Southern California >> --- >> Email: andrew.lih@gmail.com, andrew@andrewlih.com >> WEB: https://muckrack.com/fuzheado >> PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE >> >> > > -- > -Andrew Lih > Smithsonian Institution - Wikimedian at Large > Metropolitan Museum of Art - Wikimedia strategist > Previously: professor of journalism and communications, American > University, Columbia University, University of Southern California > --- > Email: andrew.lih@gmail.com, andrew@andrewlih.com > WEB: https://muckrack.com/fuzheado > PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE > >
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
Hi João, Thanks for your and Andrew's comments! André and I will look into them, and try to clarify accordingly.
As Axel described in his first response, we are still not sure what funding we will receive ahead for the work with the Content Partnerships Hub (see application here <John Andersson, 12 min, Redigerat A link to the application would be good to add. https://meta.wikimedia.org/wiki/Grants:Project/MSIG/WMSE/Content_Partnership...). Our abilities to implement any of the outcomes from the conversation will of course depend on that. Or formulated in another way, I think that our conversation laid a really good foundation for further steps, but it is still to be decided who should take those steps.
My idea would be to get back to you again when we (WMSE) have more clarity, with either good or bad news, and from that initiate a conversation on next steps and who should take them?
Best, *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den tis 28 feb. 2023 kl 20:52 skrev João Alexandre Peschanski < joalpe@wmnobrasil.org>:
Thank you, Eric and André. This is a very useful synthesis of our conversation. I have added comments, and I see Andrew has also added comments.
What do you think the next steps should be?
Best,
João
Em seg., 27 de fev. de 2023 às 15:04, Eric Luth eric.luth@wikimedia.se escreveu:
Hi all,
Thanks for a great call last week!
André and I have tried to gather the input during the meeting and from the chat into one structured document, that we would like to add to a Meta page. It would be great to know if you think we capture the conversation well: https://docs.google.com/document/d/1_tKPkUzAlaCpOuyStygkKwTQDjlSkAjnYHkp7Pr5...
Also, we were considering linking to the Etherpad document from the Meta page, to save the clever insights from the conversation. But as it includes names – please tell us if you don't want us to add a link to the Etherpad document.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den fre 24 feb. 2023 kl 12:07 skrev Fiona Romeo fromeo@wikimedia.org:
Thanks Eric for convening this week's meeting about the content partnerships hub and Andrew for the draft manifesto.
For the more specific discussion about GLAM metrics needs, the product team has created this page for you to document your requirements:
https://commons.wikimedia.org/wiki/Commons:Product_and_technical_support_for...
I appreciate that you have already shared your needs in many different channels, at different times, but this page will have the right audience at the right time.
Thanks for your patience and collaboration.
Fiona
On Tue, 21 Feb 2023 at 16:06, Andrew Lih andrew.lih@gmail.com wrote:
Thank you Eric, Axel, Sandra F., and Wikimedia Sweden for leading in this area, even though we recognize it does fall out of the scope of the formal Content Partnerships Hub project.
A few thoughts ahead of the meeting:
- As mentioned previously, I have tried to put down some aggregated
thoughts and experiences in this document on meta which anyone is welcome to edit/add/share:
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- If nothing else, I encourage folks to add ideas (raw ideas are
definitely fine) to the section "New approaches"
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- Please consider "manifesto" as sounding more scary or threatening
than it should be - it was simply the word we had been talking about for a while, as the goal was: How do we explain the current GLAM wiki state of affairs to someone new to this space, whether it is a new WMF employee, an outsider who doesn't know about cultural or heritage partnerships?
Thanks all, -Andrew
On Tue, Feb 21, 2023 at 9:54 AM Eric Luth eric.luth@wikimedia.se wrote:
Hi all,
The meeting will take place at 16:00 UTC tomorrow on this link: https://us02web.zoom.us/j/81455808411 https://www.google.com/url?q=https://us02web.zoom.us/j/81455808411&sa=D&source=calendar&ust=1677422345273437&usg=AOvVaw2PGvR3NmIQDqmPoiv_h3mv André and I will join on behalf of Wikimedia Sverige and the Content Partnerships Hub initiative. Feel free to share the link with anyone you think might be good to bring to the call.
André and I will get back to you with an agenda as soon as possible, but the main part is of course to listen in and understand the situation, and discuss the priorities, as per Axel's previous email.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den mån 20 feb. 2023 kl 17:23 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
Hi again, Thanks for all the replies to the Doodle, the winner is Wednesday February 22, 17.00-18.00 (GMT+1).
@Eric Luth eric.luth@wikimedia.se will send out a calendar invite with a meeting link and agenda tomorrow.
Bästa hälsningar, /axel
==================================== Axel Pettersson (han/honom) Projektledare GLAM/Outreach Wikimedia Sverige
+46 (0)733 96 55 65 axel.pettersson@wikimedia.se
Twitter: @Haxpett https://twitter.com/haxpett
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på *wikimedia.se/sv/blimedlem http://wikimedia.se/sv/blimedlem*
Den tors 16 feb. 2023 kl 13:48 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
> Hi all, > (Sent on behalf of the helpdesk.) > > Andrew also sent a request to the Content Partnerships Hub helpdesk > about this issue. We very much hear everyone’s concerns. Though the > Helpdesk typically deals with content uploads, we do have another part of > the hub initiative that is preparing for better (strategic) tools support > in the upcoming year(s).[1] > > Our current capacity is however very limited, and we are still not > sure what funding we will receive for our future work. Also, we currently > lack manpower and skills for this type of immediate fire-fighting, so if we > were to work on this, it would be at the expense of other prioritized > software development. > > As a response to the Helpdesk request, we would therefore suggest > setting up a meeting with all interested people on this thread, with the > goal to share perspectives and to brainstorm an approach and capture your > thoughts on priorities. > > Please provide your availability in this Doodle: > https://doodle.com/meeting/participate/id/dLZwmRWa (With excuses > for being Europe/America friendly over other time zones.) > > Please note that Sandra Fauconnier (who works as Product Strategist) > will be absent from February 15 for at least a month (due to surgery + > recovery period). During her absence, André Costa ( > andre.costa@wikimedia.se) from WMSE will represent the Content > Partnerships Hub on this topic. > > > [1] > https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software > > > Bästa hälsningar, > /axel > > ==================================== > Axel Pettersson (han/honom) > Projektledare GLAM/Outreach > Wikimedia Sverige > > +46 (0)733 96 55 65 > axel.pettersson@wikimedia.se > > Twitter: @Haxpett https://twitter.com/haxpett > > Stöd fri kunskap, bli medlem i Wikimedia Sverige. > Läs mer på *wikimedia.se/sv/blimedlem > http://wikimedia.se/sv/blimedlem* > > > Den tis 14 feb. 2023 kl 06:38 skrev Andrew Lih <andrew.lih@gmail.com > >: > >> Today, Wikimedia Cloud had an outage that highlights the fragile >> nature of our GLAM wiki ecosystem: >> >> – All tools on wmcloud.org and toolforge.org were knocked out and >> unavailable for 4 hours. >> >> https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... >> – Petscan needed an extra hour before it came back, because it is >> not setup to run automatically, and needs a manual restart by logging in as >> Magnus and running a script by hand. This is a problematic situation for >> service deployment. >> – Many tools rely on Petscan, such as GLAMorgan for expanding >> category trees and generating Mediawiki page titles, so this outage >> affected many more tools >> – BaGLAMa2 seems to have not come back successfully, as all the >> categories that should be tracked are missing. Likely, the data is all >> there somewhere, but it currently needs some loving care to be restored. >> Unclear if this is being worked on. >> – PAWS, the visual Python environment on wmcloud that is a >> workhorse for bot work and scripts, is still down and needs some loving >> care to revive. https://phabricator.wikimedia.org/T329581 >> >> In short – we're trying to be scrappy and resourceful, but we're >> hurting. >> >> -Andrew >> >> >> On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com >> wrote: >> >>> Thanks all for the feedback and conversation. >>> >>> In the meantime, has anyone gotten GLAMorgan to report back any >>> useful pageview data? >>> >>> Regardless of small, medium, or large categories, I keep getting: >>> "Data for ... pages could not be loaded from the WMF pageview API >>> (404 error)." >>> >>> https://glamtools.toolforge.org/glamorgan.html >>> >>> -Andrew >>> >>> >>> On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < >>> celebration.women@gmail.com> wrote: >>> >>>> Thanks for posting the fabricator ticket; I too have subscribed. >>>> I concur with others, lack of support for reliable tools for GLAM >>>> institutions has been a major concern for GLAMs for many years. >>>> Mary Mark Ockerbloom >>>> >>>> On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org >>>> wrote: >>>> >>>>> Thanks for adding your perspective, Dominic. >>>>> >>>>> Here is the Phabricator ticket that tracks work the Foundation >>>>> has been doing with Wikimedia Israel to resolve storage issues for the GLAM >>>>> Wiki Dashboard: https://phabricator.wikimedia.org/T321702 >>>>> >>>>> The conclusion was that it would be best for the service to use >>>>> the MediaRequest API, as Dominic has also recommended in his email. Further >>>>> to this, the Foundation's Data Platform team is looking into a custom API >>>>> endpoint for media requests by category to reduce/remove the need for data >>>>> transformation and storage. As an interim solution for the GLAM >>>>> Wiki Dashboard, we advised Wikimedia Israel to migrate their project from >>>>> Amazon Web Services to our own servers and made capacity available for >>>>> that. >>>>> >>>>> We don't know as much about the BaGLAMa2 issues at the moment. >>>>> >>>>> I'm very sorry to see our GLAM wiki community struggling with >>>>> tool instability again. >>>>> >>>>> Fiona >>>>> >>>>> On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt < >>>>> dominic@dp.la> wrote: >>>>> >>>>>> For my part, I'd like to point out that these issues are >>>>>> recurring problems, and also that when it comes to BaGLAMa lag, the longer >>>>>> it goes, the more unrecoverable it becomes. Data errors, once introduced, >>>>>> are not repairable. >>>>>> >>>>>> Dozens of the tracked categories in BaGLAMa are DPLA >>>>>> institutions, and I have shared these links numerous times over the years. >>>>>> So I frequently get questions from partners who check their data and find >>>>>> it months out of date. There is nothing I can tell them in these >>>>>> situations, except that I have regularly seen data get that lagged, and >>>>>> then eventually it reaches a point where (presumably after someone finally >>>>>> reached Magnus?) all the backlogged months come in at once. >>>>>> >>>>>> This causes its own problems, I believe, because I have to >>>>>> assume in such situations where data is generated after the fact, that it >>>>>> is all corrupt to some degree. My understanding of BaGLAMa is that >>>>>> it counts page views of articles using images from a category. But there is >>>>>> no MediaWiki log of when images were added to a page (or to a category), so >>>>>> if you are counting page views that occurred three months ago based on >>>>>> images that are in a page today, you might be counting crediting three past >>>>>> months with views for an image that was added last week. >>>>>> >>>>>> This issue causes massive data errors in the other direction >>>>>> too. Sometimes you'll have an unexplained spike, like the several >>>>>> here >>>>>> https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and >>>>>> by spike, I mean 700 million page views), and it's caused by the fact that >>>>>> an image that was on the main page for no more than hours caused BaGLAMa to >>>>>> count the entire month's page views of the main page. These errors are >>>>>> unrecoverable; they stay in the data and just increase the error of the >>>>>> overall total over time. There's never been a time where I could go to a >>>>>> maintainer and point out this massive data error and get that rerun or >>>>>> fixed. Instead, I am often in the embarrassing position of telling partners >>>>>> "Here is the analytics page, but there is a big overcount on one random >>>>>> month, so just always remember to mentally subtract 100 million from your >>>>>> total, and treat these numbers as very inexact." >>>>>> >>>>>> So as long as we are talking about BaGLAMa at all, I do have to >>>>>> point out that it is an entirely flawed tool and the data is unreliable. >>>>>> And aside from all of those bugs, the methodology is very flawed, since it >>>>>> should not be using the Pageviews API in the first place. I consider the >>>>>> data essentially fictitious anyway— we know the images we are tracking are >>>>>> probably not even receiving half of the article views we are crediting to >>>>>> them, but we continue to report bad data, because our projects rely >>>>>> on having outcomes and reporting analytics. Glamorous and Glamorgan are >>>>>> based on the same flawed methodology. >>>>>> >>>>>> And I haven't even started on the clunky UI, where an >>>>>> ever-growing list of 1000+ categories are all displayed on the landing >>>>>> page, many of which are typos or non-existent categories that can never be >>>>>> removed or cleaned up. >>>>>> >>>>>> I guess my main point here is that no amount of band aids will >>>>>> ever resolve some of the issues, and we need to be thinking about entirely >>>>>> redoing the tool itself. Or we should have already done so as soon as the >>>>>> Mediarequests API was released—which was in 2019. >>>>>> >>>>>> Thanks! >>>>>> Dominic >>>>>> >>>>>> >>>>>> On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo < >>>>>> fromeo@wikimedia.org> wrote: >>>>>> >>>>>>> Dear Andrew, >>>>>>> >>>>>>> Thanks for escalating these specific issues to us. Giovanna >>>>>>> and I were both travelling in January so we haven't been as active in >>>>>>> Telegram. >>>>>>> >>>>>>> Are you aware of anyone else having issues with the GLAM Wiki >>>>>>> Dashboard, or is it just The MET? I quickly sampled some of the >>>>>>> institutions and only saw a "bad request" for The MET. We have been >>>>>>> directly supporting Wikimedia Israel to optimise their service, so I will >>>>>>> raise this issue with both Wikimedia Israel and the Foundation team that >>>>>>> has some familiarity with their service. >>>>>>> >>>>>>> I noted these two BaGLAMa2 issues in the Telegram chat: >>>>>>> >>>>>>> https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d... >>>>>>> >>>>>>> >>>>>>> https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-... >>>>>>> >>>>>>> Are there other BaGLAMa2 reports we should be aware of? >>>>>>> >>>>>>> Metrics are definitely understood to be a priority for the >>>>>>> Foundation and I heard yesterday that metrics tools rose to the top in >>>>>>> Wikimedia Sweden’s survey too. There will be opportunities to discuss this >>>>>>> further in the context of annual planning but I will see what can be done >>>>>>> in the short term. >>>>>>> >>>>>>> More soon, >>>>>>> Fiona >>>>>>> >>>>>>> On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com >>>>>>> wrote: >>>>>>> >>>>>>>> Hi WREN and GLAM folks, >>>>>>>> >>>>>>>> I need your insights into what could be a very problematic >>>>>>>> year for us in the GLAM wiki community, as our metrics tools to measure our >>>>>>>> impact are in crisis and disrepair. If you have any insights, please do >>>>>>>> share them here, or in the GLAM Wiki Telegram group where this conversation >>>>>>>> started happening recently. >>>>>>>> >>>>>>>> I sent a "HELP!" message to the Wikimedia SE content >>>>>>>> partnerships help desk just the other day, included below, and hope this >>>>>>>> may be useful to start a conversation. If there is enough interest, we >>>>>>>> might want to start a wiki page to formally document our needs as a GLAM >>>>>>>> wiki community. Thanks. >>>>>>>> >>>>>>>> -Andrew >>>>>>>> >>>>>>>> ---- >>>>>>>> To: help@wikimedia.se >>>>>>>> >>>>>>>> I'd like to formally employ the Helpdesk's services in >>>>>>>> getting some care and attention to BaGLAMa2. It seems to have been failing >>>>>>>> since the end of last year, and even then, it was reporting extremely low >>>>>>>> figures for all categories. This is one of the few tools we have in the >>>>>>>> GLAM wiki community to measure impact and to make the case for sustaining >>>>>>>> our work. >>>>>>>> >>>>>>>> https://glamtools.toolforge.org/baglama2/ >>>>>>>> >>>>>>>> Without these basic metrics, 2023 could prove to be a >>>>>>>> disastrous year for continuing efforts. So far, we have been unable to >>>>>>>> report good, reliable numbers to folks such as the Metropolitan Museum of >>>>>>>> Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan >>>>>>>> usually cannot handle such large category trees, and also have their own >>>>>>>> problems with not being able to read the pageviews API numbers accurately, >>>>>>>> which is another issue in itself. >>>>>>>> >>>>>>>> https://glamtools.toolforge.org/glamorgan.html >>>>>>>> >>>>>>>> In short - help! How can we get this on the radar screen of >>>>>>>> people who can put more care, attention, and resources into this? Thanks. >>>>>>>> >>>>>>>> -Andrew >>>>>>>> >>>>>>>> -- >>>>>>> *Fiona Romeo* (she/her) >>>>>>> Senior Manager, Culture and Heritage >>>>>>> Wikimedia Foundation https://wikimediafoundation.org/ >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>> Wren mailing list -- wren@lists.wikimedia.org >>>>> To unsubscribe send an email to wren-leave@lists.wikimedia.org >>>>> >>>> _______________________________________________ >>>> Wren mailing list -- wren@lists.wikimedia.org >>>> To unsubscribe send an email to wren-leave@lists.wikimedia.org >>>> >>> >>> >>> -- >>> -Andrew Lih >>> Smithsonian Institution - Wikimedian at Large >>> Metropolitan Museum of Art - Wikimedia strategist >>> Previously: professor of journalism and communications, American >>> University, Columbia University, University of Southern California >>> --- >>> Email: andrew.lih@gmail.com, andrew@andrewlih.com >>> WEB: https://muckrack.com/fuzheado >>> PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE >>> >>> >> >> -- >> -Andrew Lih >> Smithsonian Institution - Wikimedian at Large >> Metropolitan Museum of Art - Wikimedia strategist >> Previously: professor of journalism and communications, American >> University, Columbia University, University of Southern California >> --- >> Email: andrew.lih@gmail.com, andrew@andrewlih.com >> WEB: https://muckrack.com/fuzheado >> PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE >> >>
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
-- *João Alexandre Peschanski | Usuário:JPeschanski (WMB)* *Diretor Executivo | **Wiki Movimento Brasil* *wmnobrasil.org http://wmnobrasil.org/*
Hi all,
André and I have tried to adjust according to the comments we received. I have added the information to this page https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software/Wikimedia_Metric_Tools, and also uploaded the document as a PDF to Commons. I hope that we have succeeded in catching all input, and feel free to add any comments (or edit directly!) on the Meta page.
Best, *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den ons 1 mars 2023 kl 10:53 skrev Eric Luth eric.luth@wikimedia.se:
Hi João, Thanks for your and Andrew's comments! André and I will look into them, and try to clarify accordingly.
As Axel described in his first response, we are still not sure what funding we will receive ahead for the work with the Content Partnerships Hub (see application here). Our abilities to implement any of the outcomes from the conversation will of course depend on that. Or formulated in another way, I think that our conversation laid a really good foundation for further steps, but it is still to be decided who should take those steps.
My idea would be to get back to you again when we (WMSE) have more clarity, with either good or bad news, and from that initiate a conversation on next steps and who should take them?
Best, *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den tis 28 feb. 2023 kl 20:52 skrev João Alexandre Peschanski < joalpe@wmnobrasil.org>:
Thank you, Eric and André. This is a very useful synthesis of our conversation. I have added comments, and I see Andrew has also added comments.
What do you think the next steps should be?
Best,
João
Em seg., 27 de fev. de 2023 às 15:04, Eric Luth eric.luth@wikimedia.se escreveu:
Hi all,
Thanks for a great call last week!
André and I have tried to gather the input during the meeting and from the chat into one structured document, that we would like to add to a Meta page. It would be great to know if you think we capture the conversation well: https://docs.google.com/document/d/1_tKPkUzAlaCpOuyStygkKwTQDjlSkAjnYHkp7Pr5...
Also, we were considering linking to the Etherpad document from the Meta page, to save the clever insights from the conversation. But as it includes names – please tell us if you don't want us to add a link to the Etherpad document.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den fre 24 feb. 2023 kl 12:07 skrev Fiona Romeo fromeo@wikimedia.org:
Thanks Eric for convening this week's meeting about the content partnerships hub and Andrew for the draft manifesto.
For the more specific discussion about GLAM metrics needs, the product team has created this page for you to document your requirements:
https://commons.wikimedia.org/wiki/Commons:Product_and_technical_support_for...
I appreciate that you have already shared your needs in many different channels, at different times, but this page will have the right audience at the right time.
Thanks for your patience and collaboration.
Fiona
On Tue, 21 Feb 2023 at 16:06, Andrew Lih andrew.lih@gmail.com wrote:
Thank you Eric, Axel, Sandra F., and Wikimedia Sweden for leading in this area, even though we recognize it does fall out of the scope of the formal Content Partnerships Hub project.
A few thoughts ahead of the meeting:
- As mentioned previously, I have tried to put down some aggregated
thoughts and experiences in this document on meta which anyone is welcome to edit/add/share:
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- If nothing else, I encourage folks to add ideas (raw ideas are
definitely fine) to the section "New approaches"
https://meta.wikimedia.org/wiki/Wikimedians_in_Residence_Exchange_Network/GL...
- Please consider "manifesto" as sounding more scary or threatening
than it should be - it was simply the word we had been talking about for a while, as the goal was: How do we explain the current GLAM wiki state of affairs to someone new to this space, whether it is a new WMF employee, an outsider who doesn't know about cultural or heritage partnerships?
Thanks all, -Andrew
On Tue, Feb 21, 2023 at 9:54 AM Eric Luth eric.luth@wikimedia.se wrote:
Hi all,
The meeting will take place at 16:00 UTC tomorrow on this link: https://us02web.zoom.us/j/81455808411 https://www.google.com/url?q=https://us02web.zoom.us/j/81455808411&sa=D&source=calendar&ust=1677422345273437&usg=AOvVaw2PGvR3NmIQDqmPoiv_h3mv André and I will join on behalf of Wikimedia Sverige and the Content Partnerships Hub initiative. Feel free to share the link with anyone you think might be good to bring to the call.
André and I will get back to you with an agenda as soon as possible, but the main part is of course to listen in and understand the situation, and discuss the priorities, as per Axel's previous email.
Best *Eric Luth* Projektledare engagemang och påverkan | Project Manager, Involvement and Advocacy Wikimedia Sverige eric.luth@wikimedia.se +46 (0) 765 55 50 95
Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se
Den mån 20 feb. 2023 kl 17:23 skrev Axel Pettersson < axel.pettersson@wikimedia.se>:
> Hi again, > Thanks for all the replies to the Doodle, the winner is Wednesday > February 22, 17.00-18.00 (GMT+1). > > @Eric Luth eric.luth@wikimedia.se will send out a calendar invite > with a meeting link and agenda tomorrow. > > Bästa hälsningar, > /axel > > ==================================== > Axel Pettersson (han/honom) > Projektledare GLAM/Outreach > Wikimedia Sverige > > +46 (0)733 96 55 65 > axel.pettersson@wikimedia.se > > Twitter: @Haxpett https://twitter.com/haxpett > > Stöd fri kunskap, bli medlem i Wikimedia Sverige. > Läs mer på *wikimedia.se/sv/blimedlem > http://wikimedia.se/sv/blimedlem* > > > Den tors 16 feb. 2023 kl 13:48 skrev Axel Pettersson < > axel.pettersson@wikimedia.se>: > >> Hi all, >> (Sent on behalf of the helpdesk.) >> >> Andrew also sent a request to the Content Partnerships Hub helpdesk >> about this issue. We very much hear everyone’s concerns. Though the >> Helpdesk typically deals with content uploads, we do have another part of >> the hub initiative that is preparing for better (strategic) tools support >> in the upcoming year(s).[1] >> >> Our current capacity is however very limited, and we are still not >> sure what funding we will receive for our future work. Also, we currently >> lack manpower and skills for this type of immediate fire-fighting, so if we >> were to work on this, it would be at the expense of other prioritized >> software development. >> >> As a response to the Helpdesk request, we would therefore suggest >> setting up a meeting with all interested people on this thread, with the >> goal to share perspectives and to brainstorm an approach and capture your >> thoughts on priorities. >> >> Please provide your availability in this Doodle: >> https://doodle.com/meeting/participate/id/dLZwmRWa (With excuses >> for being Europe/America friendly over other time zones.) >> >> Please note that Sandra Fauconnier (who works as Product >> Strategist) will be absent from February 15 for at least a month (due to >> surgery + recovery period). During her absence, André Costa ( >> andre.costa@wikimedia.se) from WMSE will represent the Content >> Partnerships Hub on this topic. >> >> >> [1] >> https://meta.wikimedia.org/wiki/Content_Partnerships_Hub/Software >> >> >> Bästa hälsningar, >> /axel >> >> ==================================== >> Axel Pettersson (han/honom) >> Projektledare GLAM/Outreach >> Wikimedia Sverige >> >> +46 (0)733 96 55 65 >> axel.pettersson@wikimedia.se >> >> Twitter: @Haxpett https://twitter.com/haxpett >> >> Stöd fri kunskap, bli medlem i Wikimedia Sverige. >> Läs mer på *wikimedia.se/sv/blimedlem >> http://wikimedia.se/sv/blimedlem* >> >> >> Den tis 14 feb. 2023 kl 06:38 skrev Andrew Lih < >> andrew.lih@gmail.com>: >> >>> Today, Wikimedia Cloud had an outage that highlights the fragile >>> nature of our GLAM wiki ecosystem: >>> >>> – All tools on wmcloud.org and toolforge.org were knocked out and >>> unavailable for 4 hours. >>> >>> https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.o... >>> – Petscan needed an extra hour before it came back, because it is >>> not setup to run automatically, and needs a manual restart by logging in as >>> Magnus and running a script by hand. This is a problematic situation for >>> service deployment. >>> – Many tools rely on Petscan, such as GLAMorgan for expanding >>> category trees and generating Mediawiki page titles, so this outage >>> affected many more tools >>> – BaGLAMa2 seems to have not come back successfully, as all the >>> categories that should be tracked are missing. Likely, the data is all >>> there somewhere, but it currently needs some loving care to be restored. >>> Unclear if this is being worked on. >>> – PAWS, the visual Python environment on wmcloud that is a >>> workhorse for bot work and scripts, is still down and needs some loving >>> care to revive. https://phabricator.wikimedia.org/T329581 >>> >>> In short – we're trying to be scrappy and resourceful, but we're >>> hurting. >>> >>> -Andrew >>> >>> >>> On Mon, Feb 13, 2023 at 10:03 AM Andrew Lih andrew.lih@gmail.com >>> wrote: >>> >>>> Thanks all for the feedback and conversation. >>>> >>>> In the meantime, has anyone gotten GLAMorgan to report back any >>>> useful pageview data? >>>> >>>> Regardless of small, medium, or large categories, I keep getting: >>>> "Data for ... pages could not be loaded from the WMF pageview API >>>> (404 error)." >>>> >>>> https://glamtools.toolforge.org/glamorgan.html >>>> >>>> -Andrew >>>> >>>> >>>> On Thu, Feb 9, 2023 at 10:10 AM Mary Mark Ockerbloom < >>>> celebration.women@gmail.com> wrote: >>>> >>>>> Thanks for posting the fabricator ticket; I too have subscribed. >>>>> I concur with others, lack of support for reliable tools for >>>>> GLAM institutions has been a major concern for GLAMs for many years. >>>>> Mary Mark Ockerbloom >>>>> >>>>> On Wed, Feb 8, 2023 at 5:41 PM Fiona Romeo fromeo@wikimedia.org >>>>> wrote: >>>>> >>>>>> Thanks for adding your perspective, Dominic. >>>>>> >>>>>> Here is the Phabricator ticket that tracks work the Foundation >>>>>> has been doing with Wikimedia Israel to resolve storage issues for the GLAM >>>>>> Wiki Dashboard: https://phabricator.wikimedia.org/T321702 >>>>>> >>>>>> The conclusion was that it would be best for the service to use >>>>>> the MediaRequest API, as Dominic has also recommended in his email. Further >>>>>> to this, the Foundation's Data Platform team is looking into a custom API >>>>>> endpoint for media requests by category to reduce/remove the need for data >>>>>> transformation and storage. As an interim solution for >>>>>> the GLAM Wiki Dashboard, we advised Wikimedia Israel to migrate their >>>>>> project from Amazon Web Services to our own servers and made >>>>>> capacity available for that. >>>>>> >>>>>> We don't know as much about the BaGLAMa2 issues at the moment. >>>>>> >>>>>> I'm very sorry to see our GLAM wiki community struggling with >>>>>> tool instability again. >>>>>> >>>>>> Fiona >>>>>> >>>>>> On Wed, 8 Feb 2023 at 21:34, Dominic Byrd-McDevitt < >>>>>> dominic@dp.la> wrote: >>>>>> >>>>>>> For my part, I'd like to point out that these issues are >>>>>>> recurring problems, and also that when it comes to BaGLAMa lag, the longer >>>>>>> it goes, the more unrecoverable it becomes. Data errors, once introduced, >>>>>>> are not repairable. >>>>>>> >>>>>>> Dozens of the tracked categories in BaGLAMa are DPLA >>>>>>> institutions, and I have shared these links numerous times over the years. >>>>>>> So I frequently get questions from partners who check their data and find >>>>>>> it months out of date. There is nothing I can tell them in these >>>>>>> situations, except that I have regularly seen data get that lagged, and >>>>>>> then eventually it reaches a point where (presumably after someone finally >>>>>>> reached Magnus?) all the backlogged months come in at once. >>>>>>> >>>>>>> This causes its own problems, I believe, because I have to >>>>>>> assume in such situations where data is generated after the fact, that it >>>>>>> is all corrupt to some degree. My understanding of BaGLAMa is that >>>>>>> it counts page views of articles using images from a category. But there is >>>>>>> no MediaWiki log of when images were added to a page (or to a category), so >>>>>>> if you are counting page views that occurred three months ago based on >>>>>>> images that are in a page today, you might be counting crediting three past >>>>>>> months with views for an image that was added last week. >>>>>>> >>>>>>> This issue causes massive data errors in the other direction >>>>>>> too. Sometimes you'll have an unexplained spike, like the several >>>>>>> here >>>>>>> https://glamtools.toolforge.org/baglama2/#gid=50&month=201611&giu=enwiki&server=en.wikipedia.org (and >>>>>>> by spike, I mean 700 million page views), and it's caused by the fact that >>>>>>> an image that was on the main page for no more than hours caused BaGLAMa to >>>>>>> count the entire month's page views of the main page. These errors are >>>>>>> unrecoverable; they stay in the data and just increase the error of the >>>>>>> overall total over time. There's never been a time where I could go to a >>>>>>> maintainer and point out this massive data error and get that rerun or >>>>>>> fixed. Instead, I am often in the embarrassing position of telling partners >>>>>>> "Here is the analytics page, but there is a big overcount on one random >>>>>>> month, so just always remember to mentally subtract 100 million from your >>>>>>> total, and treat these numbers as very inexact." >>>>>>> >>>>>>> So as long as we are talking about BaGLAMa at all, I do have >>>>>>> to point out that it is an entirely flawed tool and the data is unreliable. >>>>>>> And aside from all of those bugs, the methodology is very flawed, since it >>>>>>> should not be using the Pageviews API in the first place. I consider the >>>>>>> data essentially fictitious anyway— we know the images we are tracking are >>>>>>> probably not even receiving half of the article views we are crediting to >>>>>>> them, but we continue to report bad data, because our projects rely >>>>>>> on having outcomes and reporting analytics. Glamorous and Glamorgan are >>>>>>> based on the same flawed methodology. >>>>>>> >>>>>>> And I haven't even started on the clunky UI, where an >>>>>>> ever-growing list of 1000+ categories are all displayed on the landing >>>>>>> page, many of which are typos or non-existent categories that can never be >>>>>>> removed or cleaned up. >>>>>>> >>>>>>> I guess my main point here is that no amount of band aids will >>>>>>> ever resolve some of the issues, and we need to be thinking about entirely >>>>>>> redoing the tool itself. Or we should have already done so as soon as the >>>>>>> Mediarequests API was released—which was in 2019. >>>>>>> >>>>>>> Thanks! >>>>>>> Dominic >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 8, 2023 at 6:50 AM Fiona Romeo < >>>>>>> fromeo@wikimedia.org> wrote: >>>>>>> >>>>>>>> Dear Andrew, >>>>>>>> >>>>>>>> Thanks for escalating these specific issues to us. Giovanna >>>>>>>> and I were both travelling in January so we haven't been as active in >>>>>>>> Telegram. >>>>>>>> >>>>>>>> Are you aware of anyone else having issues with the GLAM Wiki >>>>>>>> Dashboard, or is it just The MET? I quickly sampled some of the >>>>>>>> institutions and only saw a "bad request" for The MET. We have been >>>>>>>> directly supporting Wikimedia Israel to optimise their service, so I will >>>>>>>> raise this issue with both Wikimedia Israel and the Foundation team that >>>>>>>> has some familiarity with their service. >>>>>>>> >>>>>>>> I noted these two BaGLAMa2 issues in the Telegram chat: >>>>>>>> >>>>>>>> https://bitbucket.org/magnusmanske/magnustools/issues/49/baglama-not-up-to-d... >>>>>>>> >>>>>>>> >>>>>>>> https://bitbucket.org/magnusmanske/magnustools/issues/50/baglama-not-adding-... >>>>>>>> >>>>>>>> Are there other BaGLAMa2 reports we should be aware of? >>>>>>>> >>>>>>>> Metrics are definitely understood to be a priority for the >>>>>>>> Foundation and I heard yesterday that metrics tools rose to the top in >>>>>>>> Wikimedia Sweden’s survey too. There will be opportunities to discuss this >>>>>>>> further in the context of annual planning but I will see what can be done >>>>>>>> in the short term. >>>>>>>> >>>>>>>> More soon, >>>>>>>> Fiona >>>>>>>> >>>>>>>> On Wed, 8 Feb 2023 at 10:57, Andrew Lih andrew.lih@gmail.com >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi WREN and GLAM folks, >>>>>>>>> >>>>>>>>> I need your insights into what could be a very problematic >>>>>>>>> year for us in the GLAM wiki community, as our metrics tools to measure our >>>>>>>>> impact are in crisis and disrepair. If you have any insights, please do >>>>>>>>> share them here, or in the GLAM Wiki Telegram group where this conversation >>>>>>>>> started happening recently. >>>>>>>>> >>>>>>>>> I sent a "HELP!" message to the Wikimedia SE content >>>>>>>>> partnerships help desk just the other day, included below, and hope this >>>>>>>>> may be useful to start a conversation. If there is enough interest, we >>>>>>>>> might want to start a wiki page to formally document our needs as a GLAM >>>>>>>>> wiki community. Thanks. >>>>>>>>> >>>>>>>>> -Andrew >>>>>>>>> >>>>>>>>> ---- >>>>>>>>> To: help@wikimedia.se >>>>>>>>> >>>>>>>>> I'd like to formally employ the Helpdesk's services in >>>>>>>>> getting some care and attention to BaGLAMa2. It seems to have been failing >>>>>>>>> since the end of last year, and even then, it was reporting extremely low >>>>>>>>> figures for all categories. This is one of the few tools we have in the >>>>>>>>> GLAM wiki community to measure impact and to make the case for sustaining >>>>>>>>> our work. >>>>>>>>> >>>>>>>>> https://glamtools.toolforge.org/baglama2/ >>>>>>>>> >>>>>>>>> Without these basic metrics, 2023 could prove to be a >>>>>>>>> disastrous year for continuing efforts. So far, we have been unable to >>>>>>>>> report good, reliable numbers to folks such as the Metropolitan Museum of >>>>>>>>> Art or the Smithsonian Institution. Other on-demand tools such as Glamorgan >>>>>>>>> usually cannot handle such large category trees, and also have their own >>>>>>>>> problems with not being able to read the pageviews API numbers accurately, >>>>>>>>> which is another issue in itself. >>>>>>>>> >>>>>>>>> https://glamtools.toolforge.org/glamorgan.html >>>>>>>>> >>>>>>>>> In short - help! How can we get this on the radar screen of >>>>>>>>> people who can put more care, attention, and resources into this? Thanks. >>>>>>>>> >>>>>>>>> -Andrew >>>>>>>>> >>>>>>>>> -- >>>>>>>> *Fiona Romeo* (she/her) >>>>>>>> Senior Manager, Culture and Heritage >>>>>>>> Wikimedia Foundation https://wikimediafoundation.org/ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>> Wren mailing list -- wren@lists.wikimedia.org >>>>>> To unsubscribe send an email to wren-leave@lists.wikimedia.org >>>>>> >>>>> _______________________________________________ >>>>> Wren mailing list -- wren@lists.wikimedia.org >>>>> To unsubscribe send an email to wren-leave@lists.wikimedia.org >>>>> >>>> >>>> >>>> -- >>>> -Andrew Lih >>>> Smithsonian Institution - Wikimedian at Large >>>> Metropolitan Museum of Art - Wikimedia strategist >>>> Previously: professor of journalism and communications, American >>>> University, Columbia University, University of Southern California >>>> --- >>>> Email: andrew.lih@gmail.com, andrew@andrewlih.com >>>> WEB: https://muckrack.com/fuzheado >>>> PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE >>>> >>>> >>> >>> -- >>> -Andrew Lih >>> Smithsonian Institution - Wikimedian at Large >>> Metropolitan Museum of Art - Wikimedia strategist >>> Previously: professor of journalism and communications, American >>> University, Columbia University, University of Southern California >>> --- >>> Email: andrew.lih@gmail.com, andrew@andrewlih.com >>> WEB: https://muckrack.com/fuzheado >>> PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE >>> >>>
-- -Andrew Lih Smithsonian Institution - Wikimedian at Large Metropolitan Museum of Art - Wikimedia strategist Previously: professor of journalism and communications, American University, Columbia University, University of Southern California
Email: andrew.lih@gmail.com, andrew@andrewlih.com WEB: https://muckrack.com/fuzheado PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
Wren mailing list -- wren@lists.wikimedia.org To unsubscribe send an email to wren-leave@lists.wikimedia.org
-- *João Alexandre Peschanski | Usuário:JPeschanski (WMB)* *Diretor Executivo | **Wiki Movimento Brasil* *wmnobrasil.org http://wmnobrasil.org/*