Amir
you raise a good point how do things get into the next budget, the simple
answer is first to have people/teams responsible for each of the projects.
Having someone accountable stops the ball being dropped as easily, it means
WMF starts looking at needs on longer timetables. We've seen this with
everything else the WMF does but not where it matters the most the points
which each community relies on.
In the end we should go begging to WMF for platforms to maintained. nor
should we be fighting against a wishlist gadgets just to get heard even
when that list rejects us because its too big an issue.
On Tue, 11 Jan 2022 at 16:51, Amir Sarabadani <ladsgroup(a)gmail.com> wrote:
> (Speaking in my volunteer capacity)
> I doubt there is any malicious intent by WMF. I personally think the
> underlying problem is time. Let me explain.
>
> Fixing a big issue in software takes time (I wrote a long essay about it
> in this thread) so it makes sense WMF annual planning to focus on issues
> before they get to a level that hinders community's work. The problem is
> that an issue doesn't get enough attention if it's not severe enough to
> affect users so the cycle of frustration continues. For example, I sent an
> email in February 2021, at the start of annual planning, to one of the
> directors at product outlining all of the issues of multimedia stack.
> Because at that point, it wasn't this bad, it didn't make it to FY21-22
> plans. Now I feel like a cassandra. We have similar issues in lots of other
> places that will lead to frustration. Load balancers (pybal), dumps, beta
> cluster, flagged revs, patrolling tools, etc. etc.
>
> On Tue, Jan 11, 2022 at 8:21 AM bawolff <bawolff+wn(a)gmail.com> wrote:
>
>> Honestly, I find the "not in the annual plan" thing more damning than the
>> actual issue at hand.
>>
>> The core competency of WMF is supposed to be keeping the site running.
>> WMF does a lot of things, some of them very useful, others less so, but at
>> its core its mission is to keep the site going. Everything else should be
>> secondary to that.
>>
>> It should be obvious that running a 300 TB+ media store servicing 70
>> billion requests a month requires occasional investment and maintenance
>>
>> And yet, this was not only not in this year's annual plan, it has been
>> ignored in the annual plan for many many years. We didn't get to this state
>> by just 1 year of neglect.
>>
>> Which raises the question - If wmf is not in the business of keeping the
>> Wikimedia sites going, what is it in the business of?
>>
>> On Tue, Jan 11, 2022 at 6:01 AM Kunal Mehta <legoktm(a)debian.org> wrote:
>>
>>> Hi,
>>>
>>> On 1/1/22 12:10, Asaf Bartov wrote:
>>> > It seems to me there are *very few* people who could change status
>>> quo,
>>> > not much more than a handful: the Foundation's executive leadership
>>> (in
>>> > its annual planning work, coming up this first quarter of 2022), and
>>> the
>>> > Board of Trustees.
>>>
>>> If the goal is to get paid WMF staff to fix the issues, then you're
>>> correct. However, I do not believe that as a solution is healthy
>>> long-term. The WMF isn't perfect and I don't think it's desirable to
>>> have a huge WMF that tries to do everything and has a monopoly on
>>> technical prioritization.
>>>
>>> The technical stack must be co-owned by volunteers and paid staff from
>>> different orgs at all levels. It's significantly more straightforward
>>> now for trusted volunteers to get NDA/deployment access than it used to
>>> be, there are dedicated training sessions, etc.
>>>
>>> Given that the multimedia stack is neglected and the WMF has given no
>>> indication it intends to work on/fix the problem, we should be
>>> recruiting people outside the WMF's paid staff who are interested in
>>> working on this and give them the necessary access/mentorship to get it
>>> done. Given the amount of work on e.g. T40010[1] to develop an
>>> alternative SVG renderer, I'm sure those people exist.
>>>
>>> Take moving Thumbor to Buster[2] for example. That requires
>>> forward-porting some Debian packages written Python, and then testing in
>>> WMCS that there's no horrible regressions in newer imagemagick, librsvg,
>>> etc. I'm always happy to mentor people w/r to Debian packaging (and have
>>> done so in the past), and there are a decent amount of people in our
>>> community who know Python, and likely others from the Commons community
>>> who would be willing to help with testing and dealing with whatever
>>> fallout.
>>>
>>> So I think the status quo can be changed by just about anyone who is
>>> motivated to do so, not by trying to convince the WMF to change its
>>> prioritization, but just by doing the work. We should be empowering
>>> those people rather than continuing to further entrench a WMF technical
>>> monopoly.
>>>
>>> [1] https://phabricator.wikimedia.org/T40010
>>> [2] https://phabricator.wikimedia.org/T216815
>>>
>>> -- Legoktm
>>> _______________________________________________
>>> Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
>>> To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org
>>>
>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>>
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
>
> --
> Amir (he/him)
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
> To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org
--
GN.
Separate thread. I'm not sure which list is appropriate.
*... but not all the way to sentience
<https://en.wikipedia.org/wiki/The_Uplift_War>.*
The annual community wishlist survey (implemented by a small team, possibly
in isolation?) may not be the mechanism for prioritizing large changes, but
the latter also deserves a community-curated priority queue. To complement
the staff-maintained priorities in phab ~
For core challenges (like Commons stability and capacity), I'd be surprised
if the bottleneck were people or budget. We do need a shared understanding
of what issues are most important and most urgent, and how to solve them.
For instance, a way to turn Amir's recent email about the problem (and
related phab tickets) into a family of persistent, implementable specs and
proposals and their articulated obstacles.
An issue tracker like phab is good for tracking the progress and
dependencies of agreed-upon tasks, but weak for discussing what is
important, what we know about it, how to address it. And weak for
discussing ecosystem-design issues that are important and need persistent
updating but don't have a simple checklist of steps.
So where is the best current place to discuss scaling Commons, and all that
entails? Some examples from recent discussions (most from the wm-l thread
below):
- *Uploads*: Support for large file uploads / Keeping bulk upload tools
online
- *Video*: Debugging + rolling out the videojs
<https://phabricator.wikimedia.org/T248418> player
- *Formats*: Adding support for CML
<https://phabricator.wikimedia.org/T18491> and dozens of other
<https://phabricator.wikimedia.org/T297514> common high-demand file formats
- *Thumbs*: Updating thumbor <https://phabricator.wikimedia.org/T216815>
and librsvg <https://phabricator.wikimedia.org/T193352>
- *Search*: WCQS still <https://phabricator.wikimedia.org/T297454> down
<https://phabricator.wikimedia.org/T297454>, noauth option
<https://phabricator.wikimedia.org/T297995> wanted for tools
- *General*: Finish implementing redesign
<https://phabricator.wikimedia.org/T28741> of the image table
SJ
On Wed, Dec 29, 2021 at 6:26 AM Amir Sarabadani <ladsgroup(a)gmail.com> wrote:
> I'm not debating your note. It is very valid that we lack proper support
> for multimedia stack. I myself wrote a detailed rant on how broken it is
> [1] but three notes:
> - Fixing something like this takes time, you need to assign the budget
> for it (which means it has to be done during the annual planning) and if
> gets approved, you need to start it with the fiscal year (meaning July
> 2022) and then hire (meaning, write JD, do recruitment, interview lots of
> people, get them hired) which can take from several months to years. Once
> they are hired, you need to onboard them and let them learn about our
> technical infrastructure which takes at least two good months. Software
> engineering is not magic, it takes time, blood and sweat. [2]
> - Making another team focus on multimedia requires changes in planning,
> budget, OKR, etc. etc. Are we sure moving the focus of teams is a good
> idea? Most teams are already focusing on vital parts of wikimedia and
> changing the focus will turn this into a whack-a-mole game where we fix
> multimedia but now we have critical issues in security or performance.
> - Voting Wishlist survey is a good band-aid in the meantime. To at least
> address the worst parts for now.
>
> I don't understand your point tbh, either you think it's a good idea to
> make requests for improvements in multimedia in the wishlist survey or you
> think it's not. If you think it's not, then it's offtopic to this thread.
>
> [1]
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
> [2] There is a classic book in this topic called "The Mythical Man-month"
>
> On Wed, Dec 29, 2021 at 11:41 AM Gnangarra <gnangarra(a)gmail.com> wrote:
>
>> we have to vote for regular maintenance and support for
>> essential functions like uploading files which is the core mission of
>> Wikimedia Commons
>>
>
Hi Asaf,
That's a good response, but I'm not sure it provides a practical way
forward. How can volunteers bring this issue to the attention of the WMF
leadership to get the allocation of the time of Wikimedia staff who can
take ownership implement changes here?
Presumably emails on these lists have relatively little impact at the
most senior levels, so they aren't a good way forward - and similarly on
Phabricator.
The Wishlist provides a way of showcasing issues and a relatively clear
way forward to get them implemented, but with really limited capacity.
How would a case for technical support be made apart from that? It's not
clear if a simple survey would be sufficient. Would an RfC and
discussion on meta help? Does it need the media to be involved to make
it a public crisis? Or should it be proposed as a grant request, perhaps
for a Wikimedia affiliate to implement? Or is there another avenue that
could be persued? Bearing in mind that there's no practical way for
community members to propose changes to the WMF annual plan for multiple
years now.
Sorry to defocus things and express more frustration, but I think there
should be a clear way forward with this type of issue, which isn't
obvious right now. Personally, my hopes are on the Wishlist, although
I'll be reposting a 14-year-old issue there for the fifth time when that
process opens on the 10th January...
Thanks,
Mike
On 1/1/22 20:10:43, Asaf Bartov wrote:
> Writing in my volunteer capacity:
>
> On Sat, 1 Jan 2022, 08:43 Amir Sarabadani <ladsgroup(a)gmail.com
> <mailto:ladsgroup@gmail.com>> wrote:
>
>
> Honestly, the situation is more dire than you think. For example,
> until a couple months ago, we didn't have backups for the media
> files. There was a live copy in the secondary datacenter but for
> example if due to a software issue, we lost some files, they were
> gone. I would like to thank Jaime Crespo for pushing for it and
> implementing the backups.
>
> But I beat my drum again, it's not something you can fix overnight.
> I'm sure people are monitoring this mailing list and are aware of
> the problem.
>
>
> [My goal in this post is to ficus effort and reduce frustration.]
>
> Yes, people reading here are aware, and absolutely none of them expects
> this (i.e. multimedia technical debt and missing features) to be fixed
> overnight.
>
> What's lacking, as you pointed out, is ownership of the problem. To own
> the problem, one must have *both* technical understanding of the issues
> *and* a mandate to devote resources to addressing them.
>
> It is this *combination* that we don't have at the moment. Lots of
> technical people are aware, and some of them quite willing to work
> toward addressing the issues, but they are not empowered to set
> priorities and commit resources for an effort of that scale, and the
> problems, for the most part, don't easily lend themselves to volunteer
> development.
>
> It seems to me there are *very few* people who could change status quo,
> not much more than a handful: the Foundation's executive leadership (in
> its annual planning work, coming up this first quarter of 2022), and the
> Board of Trustees.
>
> Therefore, the greatest contribution the rest of us could make toward
> seeing this work get resourced is to help make the case to the
> executives (including the new CEO, just entering the role) with clear
> and compelling illustrations of the *mission impact* of such investment.
> In parallel, interested engineers and middle managers could help by
> offering rough effort estimates for some components, a roadmap, an
> overview of alternatives considered and a rationale for a recommended
> approach, etc.
>
> But this would all be preparatory and supporting work toward *a
> resourcing decision* being made. So long as such a decision isn't made,
> no significant work on this can happen.
>
> Finally, while it is easy to agree that *this* is necessary and useful
> on its own, to actual resource it in the coming annual plan it would be
> necessary to argue that it is *more* useful and necessary than some
> *other* work, itself also necessary and useful.
>
> Another thing that may help is being explicit about just how important
> this is, even literally saying things like "this would have far more
> impact on our X goal than initiative A, B, or C", naming actual
> resourced or potentially resourced things. It is sometimes difficult for
> managers who aren't practicing Wikimedia volunteers to assess just how
> necessary different necessary things are, from different community
> perspectives.
>
> And of course, one such opinion, or a handful, would not be a solid base
> for resourcing decisions, so perhaps a large-scale ranking survey of
> some sort would be helpful, as SJ implicitly suggested in the original post.
>
> Cheers,
>
> A.
> (In my volunteer capacity)
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
> To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org
This season of giving, consider giving the truest gift of all: file format
compatibility.
Considering how much time we spend with document and data file formats, I'd
like to see those supported at least well enough to have their own place in
the search filters.
I've compiled an umbrella ticket <https://phabricator.wikimedia.org/T297514>
for related open issues; and those that have been discussed but perhaps
never filed as tickets before. Please weigh in + add those that I've
missed.
*Pro forma*t,
SJ
--
w:user:sj +1 617 529 4266
Dear ones,
Where might I get or mirror a dump of Commons media files?
> It seems worth mentioning on the front page of
https://dumps.wikimedia.org/
> It looks like the compressed XML of the ~50M description pages is ~25GB.
> It looks like wiki-team set up a dump script that posted monthly dumps to
the internet archive; in 2013 it stopped include the month+year in the
title; in 2016 it stopped altogether.
https://archive.org/details/wikimediacommons
Sorry I missed this :) Excited about CQS and wanted to play with it today,
but it seems to be down?
Is it being updated // are there mirrors // what's the current plan for
federation of endpoints like this?
SJ
On Tue, Nov 30, 2021 at 11:13 AM Trey Jones <tjones(a)wikimedia.org> wrote:
> Hi Everyone,
>
>
> The Search Platform Team
> <https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
> office hours the first Wednesday of each month. Come talk to us about
> anything related to Wikimedia search, Wikidata Query Service, Wikimedia
> Commons Query Service, etc.!
>
>
> Feel free to add your items to the Etherpad Agenda for the next meeting.
>
>
> Details for our next meeting:
>
> Date: Wednesday, December 1st, 2021
>
> Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00
> CET & WAT
>
> Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
>
> Google Meet link: https://meet.google.com/vgj-bbeb-uyi
>
> Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
>
Hi, Samuel,
Sorry for not communicating early, all my work happened in the open[0] but
I didn't want to do any public announcements until there was a 100%
completed run! :-)
> I have the feeling the bulk of Commons media (~300 TB in all) is not
mirrored anywhere right now
> I saw something related mentioned on phab? within the last year, but
can't find it now.
So this was/is the state of multimedia storage at the moment:
* There are 3 copies of each file on the live OpenStack Swift cluster in
WMF's eqiad datacenter in Virginia
* There is an almost-real time replication of eqiad's multimedia cluster
into the codfw datacenter in Texas, with its own 3 separate copies
* Images can and are regularly served from both datacenters, protecting
against local disasters like floodings or earthquakes
That has been like that for a few years already, the following is new! :-)
I (with the assistance of many other WMF engineers) started working on an
offline/offsite backup solution for all multimedia files at the end of
2020- one that would save against application bugs, operator mistakes or
potential ill-intentioned unauthorized users. The system required a
completely different backup workflow than that of our regular -wikitext or
otherwise-, backups due to the nature and size of multimedia files (large
append-only store). We were also hit with long hardware delays due to
supplier shortage for a while.
*I advocated at first to solve multimedia backups and dumps at the same
time, but this was not possible*- because how wiki file permissions are
handled currently on the mediawiki software, it is not just a question of
"creating bundles of images". Mediawiki image storage is lacking basic
features like a unique identifier for each file uploaded, and still uses
sha-1 hashing, which is known to generate false collisions. This doesn't
impact full backups, which is just "copying everything" privately (although
I had to reimplement some of that functionality myself), but doesn't make
it easy to identify individual files to update the status of already
publicly available files.
Because of that, we (data persistence team) decided to solve the backups
first, and then it will be possible to use the backup metadata to generate
dumps in the future (reusing much of the work already done). My team is not
in charge of xmldumps, so maybe a workmate will be able to update you more
accurately on the priority of that- but I really think the work I've done
will speed up dump production by a lot- e.g. dumps could be (maybe?)
generated more easily from the backup data.
So I can announce that *the first full (non-public) offline backup of
Commons on eqiad datacenter finished in September* (it took around 20 days
to run), and *a second offline and remote copy is happening right now on
codfw datacenter* and will likely finish before the end of this year. You
can see the hosts containing the backup here: [4] [5] [6] [7] These hosts
are not connected to the wikis/Internet, so if a vulnerability caused data
loss on Swift, we will be able to recover from the backups.
Because of privacy and latency -fast recovery- reasons, those copies are
hosted within WMF infrastructure (but geographically separate among each
other), but *an extra offsite copy, not hosted on WMF hardware is also
planned for the short future*. More work will be needed for fast recovery
tooling, as well as incremental/streaming backups, too. More information
about this will be documented on a wiki soon.
Those copies cannot be shared "as is", as they have been optimized for fast
recovery to production, not for separation of public and private files
(like the rest of our backups).
So if the question is, what is the main blocker for faster image dumps? I
would say it is the lack of a modern metadata storage model for images [1],
one where there is a unique identifier of each uploaded image or a modern
hashing (sha256) method is used. There is also some additional legal and
technical considerations to make regular public image datasets- those are
not impossible to solve but require some solutions. I am also personally
heavily delayed by the lack of a dedicated Multimedia Team (I am a system
administrator/Site reliability Engineer- in charge of data recovery, not a
Mediawiki developer) that can support all the bugs [2] and corruption [3] I
find along the way. It is my understanding that, at the moment, there is
not any Mediawiki developer in charge of file management code maintenance.
[0] <url:https://phabricator.wikimedia.org/T262668>
[1] <url:https://phabricator.wikimedia.org/T28741>
[2] <url:https://phabricator.wikimedia.org/T290462#7405740>
[3] <url:https://phabricator.wikimedia.org/T289996>
[4] <
https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=28&orgId=…
>
[5] <url:
https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=28&orgId=…
>
[6] <url:
https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=28&orgId=…
>
[7] <url:
https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=28&orgId=…
>
--
Jaime Crespo
<http://wikimedia.org>
Hi everyone,
I'm here to announce an important project that the GLAM and Culture team at
the Wikimedia Foundation is taking part in during the next few weeks.
Due to Outreach having limited readership and visibility within the
movement, our community newsletters don’t always receive the attention they
deserve. To address this, we’re working with our colleagues in the Movement
Communications team to *migrate the This Month in GLAM newsletter
<https://outreach.wikimedia.org/wiki/GLAM/Newsletter> from Outreach to
Meta-Wiki*.
Both teams are working on this task in the next few weeks in order to:
1. Increase visibility and participation in the GLAM newsletter.
2. Ensure the GLAM community has a place (Meta-Wiki) where they feel
seen, engaged, and supported by the Wikimedia community, partners, and
Foundation.
3. Increase the amount of multilingual (or translatable) content to
engage contributors from other languages and more regions.
This activity already has the support of the newsletter’s main editors. It
was also already announced in this October report in the newsletter
<https://outreach.wikimedia.org/wiki/GLAM/Newsletter/October_2021/Contents/W…>
.
The migration of the report pages, talk pages, categories, and templates
will happen from *November 19th to 30th*. This period is important to
accommodate the migration before the reports from next month. Any other
modifications or corrections will be made before *December 15th*.
If you have any questions or ideas about the migration, please contact the
GLAM & Culture team at glam(a)wikimedia.org and the community editors at
thismonthinglam(a)gmail.com.
Best,
Giovanna Fontenelle (she/her)
Program Officer; GLAM and Culture
Wikimedia Foundation <https://wikimediafoundation.org/>