Id be careful about using numbers in triage right now. The numbers are a little misleading as the error logging is only enabled on smaller wikis. Also if an error results in data loss but only impacts a small amount of people I would say that's worse than a benign error that occurs for lots.
We rolled out to Spanish, German and Japanese wikipedia yesterday so these numbers will start becoming more useful, but English Wikipedia will severely skew these numbers when we finally enable it.
On Tue, Sep 22, 2020, 9:59 AM Ed Sanders esanders@wikimedia.org wrote:
Speaking specifically about the new JavaScript error logging, and specifically to Alex's point about triaging these tasks, it would be very helpful if the reports included some indication of how often the error is occurring.
For example, VisualEditor is loaded several hundred thousands times per day. If an error has occurred 4 times in the last 30 days (based on a recent example) then it is probably very low priority.
On Thu, 17 Sep 2020 at 16:40, C. Scott Ananian cananian@wikimedia.org wrote:
ACN -- for what it's worth, I've been working for the foundation for a while now, and I can report from the inside that the trend is definitely
in
a positive direction. There is a lot more internal focus on addressing code debt and giving maintenance a fair spot at the table. (In fact, my entire team is now sitting inside 'maintenance' now, apparently; we used
to
be 'platform evolution'.) This email thread is one visible aspect of
that
focus on code quality, not just features.
That said, the one aspect which hasn't improved much in my time at the foundation has been the tendency of teams to work in silos. This thread also seems to be a symptom of that: a bunch of production issues are
being
dropped on the floor ('not resolved in over a month') because they are falling between the silos and nobody knows who is best able to fix them. There are knowledge/expertise gaps among the silos as well: someone qualified to fix a DB issue might be at sea trying to track down a front end bug, and vice-versa---a number of generalists in the org could technically tackle a bug no matter where it lies, but it will take them much longer to grok an unfamiliar codebase than it would for someone more familiar with that silo. So bug triage is an increasingly technical task in its own right.
This thread, as I read it sitting inside the org, isn't so much asking
for
more attention to be paid to maintenance -- we're winning that battle, internally -- as it is a plea for those folks on the edges of their silos to keep an eye out for these things which are currently falling between them and help with the triage. --scott, speaking only for myself and my view here
On Wed, Sep 16, 2020 at 11:25 PM AntiCompositeNumber < anticompositenumber@gmail.com> wrote:
There is an impression among many community members, myself included, that Foundation development generally prioritizes new features over fixing existing problems. Foundation teams will sprint for a few months to put together a minimum viable product, release it, then move on to the new hotness, leaving user requests, bugfixes, and the like behind. It often seems that the only way to get a bug fixed is to get a volunteer developer to look at it. This is likely unintentional, but it happens nonetheless.
Putting a higher priority within the Foundation on cleaning up old toys before taking out new ones is necessary for the long-term stability of the projects.
ACN
On Wed, Sep 16, 2020 at 9:05 PM Dan Andreescu <
dandreescu@wikimedia.org>
wrote:
For example, of the 30 odd backend errors reported in June, 14 were
still
open a month later in July [1], and 12 were still open â three
months
later
â in September. The majority of these haven't even yet been
triaged,
assigned assigned or otherwise acknowledged. And meanwhile we've
got
more
(non-JavaScript) stuff from July, August and September adding
pressure. We
have to do better.
-- Timo
This feels like it needs some higher level coordination. Like
perhaps
managers getting together and deciding production issues are a
priority
and
diverting resources dynamically to address them. Building an awesome
new
feature will have a lot less impact if the users are hurting from
growing
disrepair. It seems to me like if individual contributors and
maintainers
could have solved this problem, they would have by now. I'm a little worried that the only viable solution right now seems like heroes
stepping
up to fix these bugs.
Concretely, I think expanding something like the Core Platform Team's clinic duty might work. Does anyone have a very rough idea of the
time
it
would take to tackle 293 (wow we went up by a dozen since this thread started) tasks? _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- (http://cscott.net) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l