On refactorings and huge changes

List overview All Threads
Download

newer

older

mysql_* functions and MediaWiki

Please don't commit broken code

Chad

27 Jul 2011 27 Jul '11

3:27 a.m.

All,

While spending the past few days/weeks in CodeReview, it has become abundantly clear to me that we absolutely must get away from this idea of doing huge refactorings in our working copies and landing them in trunk without any warning. The examples I'm going to use here are the RequestContext, Action and Blocking refactors.

We've gotten into a very bad habit recently of doing a whole lot of work in secret in our respective working copies, and then pushing to trunk without first talking to the community to discuss our plans. This is a bad idea for a bunch of reasons.

Firstly, it skips the community feedback process until after your code is already in trunk. By skipping this process--whether it's a formal RfC, or just chatting with your peers on IRC--you miss out on the chance to get valuable feedback on your architectural decisions before they land in trunk. Once code has landed in trunk it is almost always easier to follow up and continue to "fix" the code that should've been fully spec'd out before checkin.

Also, the community *must* have the chance to call you crazy and say "don't check that in, please." Going back to my examples of Actions, had the community been consulted first I would've raised objections about the decisions made with Actions (I think they should be moved to special pages and the old action urls made to redirect for back-compat...rather than solidifying the old and crappy action interface with a new coat of paint). Looking at RequestContexts, had we talked about this in an RfC first...we probably could've skipped the whole member variable vs. accessor debate and the several months of minor cleanups that's entailed (__get() magic is evil, IMHO)

Secondly, this increases the load on reviewers, myself included. When you land a huge commit in trunk (like the Block rewrite), it takes *forever* to review the original commit + half a dozen or more followups. This drains reviewer time and leads to longer release cycles. I think I speak for everyone when I say this is bad. Small incremental changes are infinitely easier to review than large batch changes.

If you need to make huge changes: do them in a branch. It's what I did with the installer and maintenance rewrites, what Roan and Trevor did with ResourceLoader and what Brian Wolff did with his metadata improvements. Of course after landing your branches in trunk there will inevitably be some cleanup required, but it keeps trunk more stable until the branch merge and makes it easier to back out if we decide to scrap the feature/rewrite.

I know SVN branches suck. But the alternative is having a constantly unstable trunk due to alpha code that was committed haphazardly. Nobody wins in that scenario.

So please...I beg everyone. Discuss your changes first. It doesn't have to be formal (although formal spec'ing is always useful too!), but even having a second set of eyes to glance over your ideas before committing never hurts anyone.

-Chad

Show replies by date

Aaron Schulz

27 Jul 27 Jul

3:47 a.m.

Also, with some coordination, branches could be merged at a time when: * Reviewers will looking at it and testing it *right* after the merge (in addition to any branch review) * The author is around to make fixes as it gets final review

People should try to be available after any large changes (from a branch or from their local code). However, it may be the case that no one has the time to specifically focus on reviewing a certain change set right after commit. Maybe, for example, changes will only get seriously looked at a month after the commit. If the author was available in the weeks after the commit, but not when it gets serious review, then we still have a problem. This is also were some advance notification and coordination could help (especially looking at the examples Chad gave).

^demon wrote:

...

All,

While spending the past few days/weeks in CodeReview, it has become abundantly clear to me that we absolutely must get away from this idea of doing huge refactorings in our working copies and landing them in trunk without any warning. The examples I'm going to use here are the RequestContext, Action and Blocking refactors.

We've gotten into a very bad habit recently of doing a whole lot of work in secret in our respective working copies, and then pushing to trunk without first talking to the community to discuss our plans. This is a bad idea for a bunch of reasons.

Firstly, it skips the community feedback process until after your code is already in trunk. By skipping this process--whether it's a formal RfC, or just chatting with your peers on IRC--you miss out on the chance to get valuable feedback on your architectural decisions before they land in trunk. Once code has landed in trunk it is almost always easier to follow up and continue to "fix" the code that should've been fully spec'd out before checkin.

Also, the community *must* have the chance to call you crazy and say "don't check that in, please." Going back to my examples of Actions, had the community been consulted first I would've raised objections about the decisions made with Actions (I think they should be moved to special pages and the old action urls made to redirect for back-compat...rather than solidifying the old and crappy action interface with a new coat of paint). Looking at RequestContexts, had we talked about this in an RfC first...we probably could've skipped the whole member variable vs. accessor debate and the several months of minor cleanups that's entailed (__get() magic is evil, IMHO)

Secondly, this increases the load on reviewers, myself included. When you land a huge commit in trunk (like the Block rewrite), it takes *forever* to review the original commit + half a dozen or more followups. This drains reviewer time and leads to longer release cycles. I think I speak for everyone when I say this is bad. Small incremental changes are infinitely easier to review than large batch changes.

If you need to make huge changes: do them in a branch. It's what I did with the installer and maintenance rewrites, what Roan and Trevor did with ResourceLoader and what Brian Wolff did with his metadata improvements. Of course after landing your branches in trunk there will inevitably be some cleanup required, but it keeps trunk more stable until the branch merge and makes it easier to back out if we decide to scrap the feature/rewrite.

I know SVN branches suck. But the alternative is having a constantly unstable trunk due to alpha code that was committed haphazardly. Nobody wins in that scenario.

So please...I beg everyone. Discuss your changes first. It doesn't have to be formal (although formal spec'ing is always useful too!), but even having a second set of eyes to glance over your ideas before committing never hurts anyone.

-Chad

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- View this message in context: http://old.nabble.com/On-refactorings-and-huge-changes-tp32143260p32143408.h... Sent from the Wikipedia Developers mailing list archive at Nabble.com.

Daniel Friesen

4:08 a.m.

On 11-07-26 01:27 PM, Chad wrote:

...

All,

While spending the past few days/weeks in CodeReview, it has become abundantly clear to me that we absolutely must get away from this idea of doing huge refactorings in our working copies and landing them in trunk without any warning. The examples I'm going to use here are the RequestContext, Action and Blocking refactors.

We've gotten into a very bad habit recently of doing a whole lot of work in secret in our respective working copies, and then pushing to trunk without first talking to the community to discuss our plans. This is a bad idea for a bunch of reasons.

Firstly, it skips the community feedback process until after your code is already in trunk. By skipping this process--whether it's a formal RfC, or just chatting with your peers on IRC--you miss out on the chance to get valuable feedback on your architectural decisions before they land in trunk. Once code has landed in trunk it is almost always easier to follow up and continue to "fix" the code that should've been fully spec'd out before checkin.

Also, the community *must* have the chance to call you crazy and say "don't check that in, please." Going back to my examples of Actions, had the community been consulted first I would've raised objections about the decisions made with Actions (I think they should be moved to special pages and the old action urls made to redirect for back-compat...rather than solidifying the old and crappy action interface with a new coat of paint). Looking at RequestContexts, had we talked about this in an RfC first...we probably could've skipped the whole member variable vs. accessor debate and the several months of minor cleanups that's entailed (__get() magic is evil, IMHO)

Secondly, this increases the load on reviewers, myself included. When you land a huge commit in trunk (like the Block rewrite), it takes *forever* to review the original commit + half a dozen or more followups. This drains reviewer time and leads to longer release cycles. I think I speak for everyone when I say this is bad. Small incremental changes are infinitely easier to review than large batch changes.

If you need to make huge changes: do them in a branch. It's what I did with the installer and maintenance rewrites, what Roan and Trevor did with ResourceLoader and what Brian Wolff did with his metadata improvements. Of course after landing your branches in trunk there will inevitably be some cleanup required, but it keeps trunk more stable until the branch merge and makes it easier to back out if we decide to scrap the feature/rewrite.

I know SVN branches suck. But the alternative is having a constantly unstable trunk due to alpha code that was committed haphazardly. Nobody wins in that scenario.

So please...I beg everyone. Discuss your changes first. It doesn't have to be formal (although formal spec'ing is always useful too!), but even having a second set of eyes to glance over your ideas before committing never hurts anyone.

-Chad

RequestContext WAS an RFC. No one commented on it...

When it was originally committed it was a small change. It just grew as people improved it. It's not a large change like ResourceLoader and Installer that gets everyone's attention. Frankly if RequestContext and features of similar size had been committed to a branch instead of core, they would have bit-rotted there and never made it into core. Not to mention the relative size compared to things like ResourceLoader and Installer makes the negatives of svn branches relatively worse. Dozens of med-small changes getting shoved into branches and atempted to be merged into core?

I'd love to do work in branches, have everyone actually try the branches out and add their improvements there, then get them reviewed into trunk. But that workflow... that's really git, not this svn mess we have now.

Even if we kept this in the environment of svn and tried to use RFCs... people would have to actually discuss those RFCs... looking over the RFCs we have it looks like only 10% of them even get anyone discussing them. The rest just bit rot... So we're supposed to move from having a volunteer willing to code a feature and putting it into trunk, to having them putting up an RFC or putting it into a branch, and letting the code bit rot or RFC just sit there forever, and neither ever get into core?

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Ryan Kaldari

4:56 a.m.

New subject: redesigning Wikipedia's search interface

http://design.greplin.com/

Maybe we should add a judge to the panel. Jorm?

Ryan Kaldari

Chad

7:01 a.m.

On Tue, Jul 26, 2011 at 2:08 PM, Daniel Friesen lists@nadir-seen-fire.com wrote:

...

RequestContext WAS an RFC. No one commented on it...

An e-mail to the list announcing the RfC would've been nice ;-) Also waiting for more than 1 other person to edit it would've been nice too. If you're not getting responses: pester people for them.

...

When it was originally committed it was a small change. It just grew as people improved it.

It started out small enough, yes...but it quickly grew to permeate a good chunk of core code. And then the architecture was changed halfway through, leading to the crap with __get(). Having more eyes on the RfC would've avoided this, IMHO.

...

Frankly if RequestContext and features of similar size had been committed to a branch instead of core, they would have bit-rotted there and never made it into core. Not to mention the relative size compared to things like ResourceLoader and Installer makes the negatives of svn branches relatively worse. Dozens of med-small changes getting shoved into branches and atempted to be merged into core?

Medium to small changes should still be done in trunk, and at no point in my original e-mail did I say otherwise. What I'm asking for here is for large refactors/rearchitecturings to be done in branches. RequestContext may not have *started out* as a large refactor, but the architecture implications are huge in the long run.

...

I'd love to do work in branches, have everyone actually try the branches out and add their improvements there, then get them reviewed into trunk. But that workflow... that's really git, not this svn mess we have now.

Let's please not turn this into a Git v. SVN debate again. Git is cool, and will perhaps solve some of our problems, but it's not a magic bullet to review/merging. I believe first and foremost the debt of currently unreviewed code should be near-zero before we even consider a migration. It would be very easy to work ourselves into a very similar situation using Git (tons of unreviewed pull requests--waiting in their branches for some love), and anyone who thinks we couldn't get backlogged in Git is kidding themselves.

...

Even if we kept this in the environment of svn and tried to use RFCs... people would have to actually discuss those RFCs... looking over the RFCs we have it looks like only 10% of them even get anyone discussing them. The rest just bit rot... So we're supposed to move from having a volunteer willing to code a feature and putting it into trunk, to having them putting up an RFC or putting it into a branch, and letting the code bit rot or RFC just sit there forever, and neither ever get into core?

I would rather people take the time to do it right than have a constantly unstable trunk that makes backporting damn near impossible. For example, you landing your Skin overhauls a day or two after branching REL1_17 was a poor decision on your part...and made backporting stuff and releasing 1.17 a huge pain.

This kind of comes back to the idea of code freezes (boo, I know...). I don't want to ever completely freeze trunk; that discourages contributions as we've discussed before. I'm just asking for a little bit of common sense from all around so we can keep trunk stable and push for more regular deploys and releases.

-Chad

Happy-melon

29 Jul 29 Jul

10:30 a.m.

I have mixed reactions to what's been said here, although some points are certainly good. I don't think the "size" of a change is really the important factor for determining both how easy it is to review, and how easy it would be to branch and merge; a much better metric is "complexity", how tightly it integrates with the rest of the codebase, and how many 'external' codepoints must be updated to complete the change.

ResourceLoader, the Installer, the Metadata, are all examples of changes which may be quite large, but which are quite isolated within the codebase and do not really sprawl across it in a particularly complex way. the Blocking and Skin rewrites and (particularly) RequestContext are examples of the opposite, changes which are *not* actually particularly large in terms of number-of-lines (the blocking rewrite rewrote four files, RequestContext two, compared to about forty for ResourceLoader and 25 for Installer). Having code together in a few files totally-rewritten or added afresh is both easier to review, *and* easier to branch and merge.

A branch merge of a complicated change (like the Login rewrite I did in summer 2009, which bounced three times and is now bitrotted into oblivion) looks incredibly scary and is essentially impossible to review, you have to take it on faith assuming that the review *of the code while it was in the branch* was good enough. Yes, it will contain iterated fixes for simple errors, but the sequence being compressed into one revision is a double-edged sword: rewrites are inevitably done progressively, and the easiest path for the reviewer to take is the same one as the coder. When everything is compressed into one branch-merge-revision, there is no context from which to reconstruct that path. So, and I think most people are saying the same thing, the review will still have to happen in the branch, on the individual revisions, with all the false-starts and follow-ups, and merged in one (probably very messy) branch merge.

But making complicated changes in a branch doesn't make them any less complicated, or any easier to review in the branch. Complicated changes are difficult *because* of their extensive interaction with other code elements, not because of a particularly grand scope. There is no such thing as a "small incremental change" when changing one interface in the actual subject of the rewrite requires changing fifty callers, the other interfaces on the target also called by the fifty callers, and the 200 other places that call *them*.

The list of actual *changes* to the blocking code is fairly minimal: internalise variables, rewrite the way blocks are constructed from a target or targets, separate and modernise the frontend interfaces for blocking, unblocking and listing blocks. It is complicated, difficult to review and with a huge number of follow-up commits, because of the large number of *other* places that blocks are referenced and manipulated. Those other places still exist in a branch, are still just as difficult to *find* in a branch, and still just as time-consuming to fix and review in a branch.

Thus I don't agree that doing rewrites like the Blocking stuff in a branch would have made it any easier, or any less time-consuming, to review. On the other hand, you make an excellent point about *discussion* of changes before implementation; that needs to happen, and it needs to be thorough. I again don't think that a branch is particularly useful for that purpose because to 'discuss' something in code is to go through exactly the same process of writing, rewriting and discarding as if you just went ahead and did it. An RfC, on the other hand, allows (and requires) a spec to be developed in natural language, with great saving of time and effort.

*If* it gets the participation it deserves. The people who would be spending their time calling "crazy" on checked-in code *must* take the attitude that participating in RfCs *is* the alternative and is time well spent, that they are not making problems magically go away, only moving them to another venue where they can be nipped in the bud. There are only three options: those two, and dismissing the proposal altogether *regardless of its merits*.

We currently have ten RfCs listed; three are over a year old, five are between three and six months, and two are younger than a month. Five of them, including *all* of the old ones, have no substantive contributions by *anyone* apart from the author. Two more have exactly one other contributor, and two (Math and Configuration) are swarming with commenters. This is not a healthy system. It could *become* a healthy system, but only if *everyone* is prepared to make it an integral part of their development work, not just the people who are being diligent enough to be, or being pressured into, writing proposals.

I don't think that anyone really thinks RFC is the magic bullet, but it's even less than what you think it is: it's *another* broken system that's in need of a lot of TLC and structural changes to get working again. We have plenty enough of those on our plate already. You could say quite legitimately that it's part of the bigger problem and the fix will fit naturally into the bigger solution. But now you're getting dangerously close to the non sequitur than when nothing is broken, everything will be fixed.

--HM

PS: sorry for TLDR-ness... :-(

"Chad" innocentkiller@gmail.com wrote in message news:CADn73rP=5Uiaa878mHaZjdjdciTmhYvG4+nR_0EnsdYxjsZ3nA@mail.gmail.com...

...

On Tue, Jul 26, 2011 at 2:08 PM, Daniel Friesen lists@nadir-seen-fire.com wrote:

...
RequestContext WAS an RFC. No one commented on it...

An e-mail to the list announcing the RfC would've been nice ;-) Also waiting for more than 1 other person to edit it would've been nice too. If you're not getting responses: pester people for them.

...
When it was originally committed it was a small change. It just grew as people improved it.

It started out small enough, yes...but it quickly grew to permeate a good chunk of core code. And then the architecture was changed halfway through, leading to the crap with __get(). Having more eyes on the RfC would've avoided this, IMHO.

...
Frankly if RequestContext and features of similar size had been committed to a branch instead of core, they would have bit-rotted there and never made it into core. Not to mention the relative size compared to things like ResourceLoader and Installer makes the negatives of svn branches relatively worse. Dozens of med-small changes getting shoved into branches and atempted to be merged into core?

Medium to small changes should still be done in trunk, and at no point in my original e-mail did I say otherwise. What I'm asking for here is for large refactors/rearchitecturings to be done in branches. RequestContext may not have *started out* as a large refactor, but the architecture implications are huge in the long run.

...
I'd love to do work in branches, have everyone actually try the branches out and add their improvements there, then get them reviewed into trunk. But that workflow... that's really git, not this svn mess we have now.

Let's please not turn this into a Git v. SVN debate again. Git is cool, and will perhaps solve some of our problems, but it's not a magic bullet to review/merging. I believe first and foremost the debt of currently unreviewed code should be near-zero before we even consider a migration. It would be very easy to work ourselves into a very similar situation using Git (tons of unreviewed pull requests--waiting in their branches for some love), and anyone who thinks we couldn't get backlogged in Git is kidding themselves.

...
Even if we kept this in the environment of svn and tried to use RFCs... people would have to actually discuss those RFCs... looking over the RFCs we have it looks like only 10% of them even get anyone discussing them. The rest just bit rot... So we're supposed to move from having a volunteer willing to code a feature and putting it into trunk, to having them putting up an RFC or putting it into a branch, and letting the code bit rot or RFC just sit there forever, and neither ever get into core?

I would rather people take the time to do it right than have a constantly unstable trunk that makes backporting damn near impossible. For example, you landing your Skin overhauls a day or two after branching REL1_17 was a poor decision on your part...and made backporting stuff and releasing 1.17 a huge pain.

This kind of comes back to the idea of code freezes (boo, I know...). I don't want to ever completely freeze trunk; that discourages contributions as we've discussed before. I'm just asking for a little bit of common sense from all around so we can keep trunk stable and push for more regular deploys and releases.

-Chad

Thomas Gries

1:44 p.m.

Am 29.07.2011 06:30, schrieb Happy-melon:

...

We currently have ten RfCs listed; three are over a year old, five are between three and six months, and two are younger than a month.... This is not a healthy system. It could *become* a healthy system, but only if *everyone* is prepared to make it an integral part of their development work, not just the people who are being diligent enough to be, or being pressured into, writing proposals.

I suggest to add a new bugzilla category "Request for Comments". Filed as https://bugzilla.wikimedia.org/show_bug.cgi?id=30111

Platonides

30 Jul 30 Jul

10:43 p.m.

...

PS: sorry for TLDR-ness... :-(

It was worth reading! You make very good points.

Platonides

27 Jul 27 Jul

6:42 a.m.

+1 I completely agree.

Dantman is right in that such has to be accompanied by promptly reviews.

First problem is, people has to read the RFC. Searching the archive for 'RFC' I don't find it announced to the mailing list. A RFC could be just text, have some sample code, or a full implementation in a branch. The time from rfc status to having code for it could be [0, +∞). If the author considers it ready to merge and there has been no opposition in a week, then merge it. That would solve the rotting problem. I don't think it would be hard to keep a branch up to date for a week (at least without other merges).

If the code had problems, and took 6 days to fix, or people praised it from day 1, you are getting feedback, so there are no hard limits.

And please inform us. If you create a new RFC, email the list stating so. And again a few days before you are going to merge it (specially if it's an old branch nobody has cared about for months).

Chad

6:49 a.m.

On Tue, Jul 26, 2011 at 4:42 PM, Platonides Platonides@gmail.com wrote:

...

If the author considers it ready to merge and there has been no opposition in a week, then merge it. That would solve the rotting problem. I don't think it would be hard to keep a branch up to date for a week (at least without other merges).

I disagree here. Silence should not be taken for assent. Perhaps we should formalize the "approval" process for RfCs. I agree with the rest of your message in that we absolutely need to get more eyes on RfCs...and I'm open to suggestions on how to do so.

-Chad

Platonides

3:21 p.m.

Chad wrote:

...

I disagree here. Silence should not be taken for assent. Perhaps we should formalize the "approval" process for RfCs. I agree with the rest of your message in that we absolutely need to get more eyes on RfCs...and I'm open to suggestions on how to do so.

-Chad

They are core committers. I'm not talking about bugzilla patches which might need babysitting. The timeout is important because it ensures that it will be reviewed, falling back to the current model. Otherwise, projects [not sponsored by WMF] will end up delayed until After Wikimania/After fundraising/After next scap... So people would probably continue committing to trunk "because it's the only way to know it will eventually be reviewed". Plus, I'm sure there will be mucho more eyes on it after the "If nobody opposes, I'm merging it tomorrow" than the week before.

Besides, you can give the author feedback: I'm busy right now, but I'll review it next week, requesting for more time.

Jeroen De Dauw

8:56 p.m.

Hey,

...

Otherwise, projects [not sponsored by WMF] will end up delayed until After

Wikimania/After fundraising/After next scap...

...

So people would probably continue committing to trunk "because it's the

only way to know it will eventually be reviewed".

After being a non-core, non-WMF community member for over 2 years, I'd say this is definitely the case. It's the logical outcome of the WMF giving priority to it's mission over MediaWiki as a platform, which is perfectly reasonable, so this is not criticism on anyone. Reason for this post: I've seen the above argument downplayed or simply rejected by several people before, while it's definitely important in getting contributions by people not on the WMF-payroll.

Cheers

-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --

Rob Lanphier

28 Jul 28 Jul

7:12 a.m.

On Tue, Jul 26, 2011 at 4:49 PM, Chad innocentkiller@gmail.com wrote:

...

On Tue, Jul 26, 2011 at 4:42 PM, Platonides Platonides@gmail.com wrote:

...
If the author considers it ready to merge and there has been no opposition in a week, then merge it. That would solve the rotting problem. I don't think it would be hard to keep a branch up to date for a week (at least without other merges).

I disagree here. Silence should not be taken for assent. Perhaps we should formalize the "approval" process for RfCs. I agree with the rest of your message in that we absolutely need to get more eyes on RfCs...and I'm open to suggestions on how to do so.

Well, I think checking into trunk after a week could work once we get caught up with code review on trunk, though email to wikitech-l needs to be a required part of the process. I'd ask that everyone hold off on being this aggressive before we get caught up, since this will only exacerbate the review backlog. As of today, we're closer than we've been for a very long time, with 728 "new" revisions, and we're trending in the right direction.

The RfC process can work if people make an earnest effort to get their RfCs reviewed.

Rob

Bryan Tong Minh

1:09 a.m.

Hey,

On Tue, Jul 26, 2011 at 10:27 PM, Chad innocentkiller@gmail.com wrote:

...

All,

While spending the past few days/weeks in CodeReview, it has become abundantly clear to me that we absolutely must get away from this idea of doing huge refactorings in our working copies and landing them in trunk without any warning. The examples I'm going to use here are the RequestContext, Action and Blocking refactors.

I mostly agree with your observations. Another problem with the current approach is that it is very hard to change architectural decisions. For example with the Action rewrite I'm not very happy with the coupling between interface and back end, which I stated on CR, but really this is a fundamental issue that is very hard to modify.

I agree however also with the other commentators on this tread. There will be no review on things done in branches, they'll just bitrot away.

Unfortunately, I don't see a satisfactory solution to this. Maybe others?

Bryan

4906

Age (days ago)

4910

Last active (days ago)

wikitech-l@lists.wikimedia.org

13 comments

10 participants

tags (0)

participants (10)

Aaron Schulz
Bryan Tong Minh
Chad
Daniel Friesen
Happy-melon
Jeroen De Dauw
Platonides
Rob Lanphier
Ryan Kaldari
Thomas Gries