Hi everyone,
Aaron has been working on bug 37225 for a while: https://bugzilla.wikimedia.org/show_bug.cgi?id=37225
...but has hit a bit of a brick wall with the bug. I encouraged him to send mail to the list about it, but he's skeptical that more than a very small set of people can be helpful on this, which is part of why I suspect he hasn't sent mail. Please prove him wrong :)
The bug cropped up a few weeks ago, reported May 30. It's hard to say if this was a bug that was caused by a core change deployed around that time, an extension deployed around then, or if it was a latent bug that people just got fed up enough to finally report around that time. At any rate, it's been a tough one to diagnose.
I think one area where a broad search could be helpful is narrowing down when this first started happening, and narrowing down which deployment must have been the issue. Since we're on a biweekly cycle, it shouldn't be a huge amount of code that contributes to this bug. For example, someone with enough git-fu and having read the bug carefully enough might be able to point to a likely culprit revision.
Thanks Rob
p.s. actually, knowing Aaron and having read the last comment on the bug, I suspect the reason he didn't send a mail is that he seems to be close to figuring this out. Still, if someone narrows this down, that'd really kick butt.
I think one area where a broad search could be helpful is narrowing down when this first started happening, and narrowing down which deployment must have been the issue. Since we're on a biweekly cycle, it shouldn't be a huge amount of code that contributes to this bug. For example, someone with enough git-fu and having read the bug carefully enough might be able to point to a likely culprit revision.
I am going to hijack this thread -- hence the change in subject -- since this is related to the other thread I started about git commit history.
If this is a newly introduced bug (which seems to be the case from what I can gather from this email), then git bisect would have been the ideal way to nail the culprit commit. git bisect basically lets you binary search your way to a bad commit starting from a known good commit and a known bad commit. It is an absolutely awesome tool that git gives you. I have used this more than once on other projects to nail a bug.
But, git bisect will work best with a linear commit history, and it is not going to really work well with the current totally non-linear commit history on core. So, one very good reason to examine how to minimize merge commits with gerrit.
That said, with some patience, git bisect might still be a useful thing to try.
Subbu.
Subramanya Sastry wrote:
But, git bisect will work best with a linear commit history, and it is not going to really work well with the current totally non-linear commit history on core. So, one very good reason to examine how to minimize merge commits with gerrit.
I would have to carefully check, but I am almost sure git bisect works on all ancestors and will happily use the merged commits Gerrit generates for us.
The problem is that we can't reproduce it. Otherwise we would have already found the problematic commit (yes, git bisect would be able to).
It probably only happens with lagged slaves and many people editing at the same time.
I have already studied the commits in that range, but none of them seems likely to have produced the bug. :(
As we can detect it easily on big wikis (such as enwiki) what we could do is to (a) rollback enwiki to such version, (b) verify it got fixed (ie. it's a mediawiki core/extension bug), (c) advance bisecting to the next wmf release. Yep, bisecting in the live cluster. That's far from ideal, but not being able to find it otherwise, that _should_ reveal it.
Thanks for the additional explanation, Platonides.
git bisect can handle merge commits, but I have run into problems with it -- but it may also have been my weak git-fu. FWIW, I was able to linearize history using git rebase on a local branch -- I had to resolve a couple very minor merge conflicts, but the rebased repo is identical to the current repo - so that is an option to consider if/when someone runs into git bisect issues.
Anyway, apologies for the temporary hijacking of the thread. I am done with the non-linear history bit for now.
Back to the more important issue of the bug itself.
Subbu.
The problem is that we can't reproduce it. Otherwise we would have already found the problematic commit (yes, git bisect would be able to).
It probably only happens with lagged slaves and many people editing at the same time.
I have already studied the commits in that range, but none of them seems likely to have produced the bug. :(
As we can detect it easily on big wikis (such as enwiki) what we could do is to (a) rollback enwiki to such version, (b) verify it got fixed (ie. it's a mediawiki core/extension bug), (c) advance bisecting to the next wmf release. Yep, bisecting in the live cluster. That's far from ideal, but not being able to find it otherwise, that _should_ reveal it.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org