Converting to Git?

List overview All Threads
Download

newer

older

The return of the Weekend Sprint...

Re: [Wikitech-l] Enable WikiTrust...

Ævar Arnfjörð Bjarmason

20 Mar 2011 20 Mar '11

10:55 a.m.

On Sun, Mar 20, 2011 at 14:21, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...

On Wed, Mar 16, 2011 at 19:26, Yuvi Panda yuvipanda@gmail.com wrote:

...
I noticed that there's a github mirror of the svn repository at https://github.com/mediawiki, but it is rather out of date. Any idea if/when it could be made up-to-date again?

I've been busy / out of town so I haven't fixed the MediaWiki mirror in GitHub yet.

I'll do so soon.

I'm now re-running git-svn clone on the relevant paths (lost the original), once that's complete I can update the mirror again.

I'm cloning from svn+ssh:// this time instead of my pushmi mirror file://, so people cloning this should be able to push upstream with git-svn without using git-filter-branch to rewrite the history first.

But actually the reason I did this mirror was as a proof of concept for a (still incomplete) conversion to Git.

Is there still interest in that? I don't have a lot of time for it, but I could help with that if people want to go that way.

I don't care much myself since I don't do MediaWiki development anymore, but I'd be happy to help with it.

Show replies by date

Yuvi Panda

22 Mar 22 Mar

2:27 a.m.

On Sun, Mar 20, 2011 at 9:25 PM, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...

But actually the reason I did this mirror was as a proof of concept for a (still incomplete) conversion to Git.

Is there still interest in that? I don't have a lot of time for it, but I could help with that if people want to go that way.

If lack of people dedicated to this is why a migration isn't being considered (I guess not), I volunteer myself.

-- Yuvi Panda T http://yuvi.in/blog

Neil Kandalgaonkar

3:28 a.m.

I was waiting for RobLa to jump in here... as far as I know we are still trying to find ways to move to Git, Some time after the dust settles on 1.17.

Rob?

On 3/22/11 12:27 AM, Yuvi Panda wrote:

...

On Sun, Mar 20, 2011 at 9:25 PM, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...
But actually the reason I did this mirror was as a proof of concept for a (still incomplete) conversion to Git.

Is there still interest in that? I don't have a lot of time for it, but I could help with that if people want to go that way.

If lack of people dedicated to this is why a migration isn't being considered (I guess not), I volunteer myself.

-- Neil Kandalgaonkar (| neilk@wikimedia.org

Ævar Arnfjörð Bjarmason

4:15 a.m.

On Tue, Mar 22, 2011 at 08:27, Yuvi Panda yuvipanda@gmail.com wrote:

...

On Sun, Mar 20, 2011 at 9:25 PM, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...
But actually the reason I did this mirror was as a proof of concept for a (still incomplete) conversion to Git.

Is there still interest in that? I don't have a lot of time for it, but I could help with that if people want to go that way.

If lack of people dedicated to this is why a migration isn't being considered (I guess not), I volunteer myself.

Lack of time and people is indeed a factor. The import we have now isn't a proper Git conversion.

I still have some vague notes here detailing approximately what we need, some of these are out of date. The "Split up and convert" section is somewhat accurate though:

http://www.mediawiki.org/wiki/Git_conversion

No SVN to Git tool does exactly what we need due to our messy history. I came to the conclusion that it was probably easiest to filter the SVN dump (to e.g. fix up branch paths) before feeding the history to one of these tools.

Of course even if we come up with a perfect conversion it's pretty much useless if Wikimedia doesn't want to use it for its main repositories. So getting a yes/no on whether this is wanted by WM before you proceed with something would prevent you/others from wasting their time on this.

Siebrand Mazeland

9:25 a.m.

...

From what I understand, common thought is that phase3 and all individual

extensions, as well as directories in trunk/ aside from extensions and phase3 will be their own repos. Possibly there will be meta collections that allow cloning things in one go, but that does not allow committing to multiple repos in one go without requiring scripting. This is a use case that is used *a lot* by L10n committers and others. I think this is bad.

I am raising my objections against GIT as a replacement VCS for MediaWiki's svn.wikimedia.org and the way people are talking about implementing it again from an i18n perspective, and also from a community/product stability perspective.

I raised this in the thread "Migrating to GIT (extensions)"[1,2] mid February. My concerns have not been taken away. i18n/L10n maintenance will be a lot harder and more distributed. In my opinion the MediaWiki development community is not harmed by the continued use of Subversion. In fact, the global maintenance - I define this as fixing backward incompatibilities introduced in core in the 400+ extensions in Subversion, as well as updating extensions to current coding standard - that many active developers are involved in now, will likely decrease IMO, because having to commit to multiple repos will make it more cumbersome to perform these activities. Things that require extra work by a developer without any obvious benefits out are just discontinued in my experience. As a consequence, the number of unmaintained and crappy extensions will increase, which is bad for the product image and in the end for the community - not caring about that single extension repo is too easy, and many [devs] not caring about hundreds [of extensions] is even worse.

Please convince me that things will not be as hard as I describe above, or will most definitely not turn out as I fear. I am open to improvements, but moving to GIT without addressing these concerns for the sake of having this great DVCS is not justified IMO.

Siebrand

M: +31 6 50 69 1239 Skype: siebrand

[1] http://lists.wikimedia.org/pipermail/wikitech-l/2011-February/thread.html#5 1812

[2] http://lists.wikimedia.org/pipermail/wikitech-l/2011-February/051817.html

On 22-03-11 10:15 Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...

On Tue, Mar 22, 2011 at 08:27, Yuvi Panda yuvipanda@gmail.com wrote:

...
On Sun, Mar 20, 2011 at 9:25 PM, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...
But actually the reason I did this mirror was as a proof of concept for a (still incomplete) conversion to Git.

Is there still interest in that? I don't have a lot of time for it, but I could help with that if people want to go that way.

If lack of people dedicated to this is why a migration isn't being considered (I guess not), I volunteer myself.

Lack of time and people is indeed a factor. The import we have now isn't a proper Git conversion.

I still have some vague notes here detailing approximately what we need, some of these are out of date. The "Split up and convert" section is somewhat accurate though:

http://www.mediawiki.org/wiki/Git_conversion

No SVN to Git tool does exactly what we need due to our messy history. I came to the conclusion that it was probably easiest to filter the SVN dump (to e.g. fix up branch paths) before feeding the history to one of these tools.

Of course even if we come up with a perfect conversion it's pretty much useless if Wikimedia doesn't want to use it for its main repositories. So getting a yes/no on whether this is wanted by WM before you proceed with something would prevent you/others from wasting their time on this.

Chad

9:32 a.m.

On Tue, Mar 22, 2011 at 10:25 AM, Siebrand Mazeland s.mazeland@xs4all.nl wrote:

...

Please convince me that things will not be as hard as I describe above, or will most definitely not turn out as I fear. I am open to improvements, but moving to GIT without addressing these concerns for the sake of having this great DVCS is not justified IMO.

I've actually come to partially agree with you since the last time we discussed this. Really, the extension repository *should* remain in Subversion as it is. I would, however, like to move phase3 to git. Then i18n can just be two commits, instead of 400+

-Chad

Roan Kattouw

9:44 a.m.

2011/3/22 Chad innocentkiller@gmail.com:

...

I've actually come to partially agree with you since the last time we discussed this. Really, the extension repository *should* remain in Subversion as it is. I would, however, like to move phase3 to git. Then i18n can just be two commits, instead of 400+

Extensions in SVN but phase3 in git? That doesn't really make sense to me, TBH. Would it really be the end of the world if we had phase3 in one repo and all extensions in another repo?

Roan Kattouw (Catrope)

Chad

9:46 a.m.

On Tue, Mar 22, 2011 at 10:44 AM, Roan Kattouw roan.kattouw@gmail.com wrote:

...

2011/3/22 Chad innocentkiller@gmail.com:

...
I've actually come to partially agree with you since the last time we discussed this. Really, the extension repository *should* remain in Subversion as it is. I would, however, like to move phase3 to git. Then i18n can just be two commits, instead of 400+

Extensions in SVN but phase3 in git? That doesn't really make sense to me, TBH. Would it really be the end of the world if we had phase3 in one repo and all extensions in another repo?

Perhaps in the long run. I think in the short-run it'd be more pain-free and perhaps a useful experiment to just move phase3 to git. Then we can see how we feel about moving the rest over (or if we hate it and want to go back)

-Chad

Roan Kattouw

9:49 a.m.

2011/3/22 Chad innocentkiller@gmail.com:

...

Perhaps in the long run. I think in the short-run it'd be more pain-free and perhaps a useful experiment to just move phase3 to git. Then we can see how we feel about moving the rest over (or if we hate it and want to go back)

Hmm, that's a good point, let's not bite off too much the first time.

Roan Kattouw (Catrope)

Trevor Parscal

10:08 a.m.

Your objections seem to be based on the assumption that you would need to have push access to all repositories, but I think that's the point of DCVS, you can just fork them, and then people can pull your changes in themselves (or using a tool). Pull requests could even be generated when things are out of sync.

I think it's quite possible this could make i18n/L10n work easier, not more difficult.

- Trevor

On Mar 22, 2011, at 7:25 AM, Siebrand Mazeland wrote:

...

From what I understand, common thought is that phase3 and all individual extensions, as well as directories in trunk/ aside from extensions and phase3 will be their own repos. Possibly there will be meta collections that allow cloning things in one go, but that does not allow committing to multiple repos in one go without requiring scripting. This is a use case that is used *a lot* by L10n committers and others. I think this is bad.

I am raising my objections against GIT as a replacement VCS for MediaWiki's svn.wikimedia.org and the way people are talking about implementing it again from an i18n perspective, and also from a community/product stability perspective.

I raised this in the thread "Migrating to GIT (extensions)"[1,2] mid February. My concerns have not been taken away. i18n/L10n maintenance will be a lot harder and more distributed. In my opinion the MediaWiki development community is not harmed by the continued use of Subversion. In fact, the global maintenance - I define this as fixing backward incompatibilities introduced in core in the 400+ extensions in Subversion, as well as updating extensions to current coding standard - that many active developers are involved in now, will likely decrease IMO, because having to commit to multiple repos will make it more cumbersome to perform these activities. Things that require extra work by a developer without any obvious benefits out are just discontinued in my experience. As a consequence, the number of unmaintained and crappy extensions will increase, which is bad for the product image and in the end for the community - not caring about that single extension repo is too easy, and many [devs] not caring about hundreds [of extensions] is even worse.

Please convince me that things will not be as hard as I describe above, or will most definitely not turn out as I fear. I am open to improvements, but moving to GIT without addressing these concerns for the sake of having this great DVCS is not justified IMO.

Siebrand

M: +31 6 50 69 1239 Skype: siebrand

[1] http://lists.wikimedia.org/pipermail/wikitech-l/2011-February/thread.html#5 1812

[2] http://lists.wikimedia.org/pipermail/wikitech-l/2011-February/051817.html

On 22-03-11 10:15 Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...
On Tue, Mar 22, 2011 at 08:27, Yuvi Panda yuvipanda@gmail.com wrote:

...
On Sun, Mar 20, 2011 at 9:25 PM, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...
But actually the reason I did this mirror was as a proof of concept for a (still incomplete) conversion to Git.

Is there still interest in that? I don't have a lot of time for it, but I could help with that if people want to go that way.

If lack of people dedicated to this is why a migration isn't being considered (I guess not), I volunteer myself.

Lack of time and people is indeed a factor. The import we have now isn't a proper Git conversion.

I still have some vague notes here detailing approximately what we need, some of these are out of date. The "Split up and convert" section is somewhat accurate though:

http://www.mediawiki.org/wiki/Git_conversion

No SVN to Git tool does exactly what we need due to our messy history. I came to the conclusion that it was probably easiest to filter the SVN dump (to e.g. fix up branch paths) before feeding the history to one of these tools.

Of course even if we come up with a perfect conversion it's pretty much useless if Wikimedia doesn't want to use it for its main repositories. So getting a yes/no on whether this is wanted by WM before you proceed with something would prevent you/others from wasting their time on this.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Chad

10:21 a.m.

On Tue, Mar 22, 2011 at 11:08 AM, Trevor Parscal tparscal@wikimedia.org wrote:

...

Your objections seem to be based on the assumption that you would need to have push access to all repositories, but I think that's the point of DCVS, you can just fork them, and then people can pull your changes in themselves (or using a tool).

So now the Translatewiki guys will need to maintain a fork just for i18n updates, then have to wait for them to be pulled into main? That sounds like more work on both Translatewiki as well as the developer--daily pull requests for i18n updates, seriously?

-Chad

Max Semenik

10:33 a.m.

On 22.03.2011, 18:08 Trevor wrote:

...

Your objections seem to be based on the assumption that you would need to have push access to all repositories, but I think that's the point of DCVS, you can just fork them, and then people can pull your changes in themselves (or using a tool). Pull requests could even be generated when things are out of sync.

...

I think it's quite possible this could make i18n/L10n work easier, not more difficult.

You seem to miss Siebrand's point: curerently, all localisation updates take one commit per day. Splitting stuff to separate repos will result in up to 400 commits per day that will also need to be pushed and reintegrated - an epic waste of time and common sense. Or localisation will simply lie aside in forks and people will miss them when checking out from the "official" source.

-- Best regards, Max Semenik ([[User:MaxSem]])

Trevor Parscal

10:38 a.m.

My suggestion is that all of this "busy" work is highly automatable, but I'm sure he has a greater ability to assess the complexities of this work than I do.

In general I feel that we should be thinking about "how would we make this work" instead of "why should we not do this".

- Trevor

On Mar 22, 2011, at 8:33 AM, Max Semenik wrote:

...

On 22.03.2011, 18:08 Trevor wrote:

...
Your objections seem to be based on the assumption that you would need to have push access to all repositories, but I think that's the point of DCVS, you can just fork them, and then people can pull your changes in themselves (or using a tool). Pull requests could even be generated when things are out of sync.

...
I think it's quite possible this could make i18n/L10n work easier, not more difficult.

You seem to miss Siebrand's point: curerently, all localisation updates take one commit per day. Splitting stuff to separate repos will result in up to 400 commits per day that will also need to be pushed and reintegrated - an epic waste of time and common sense. Or localisation will simply lie aside in forks and people will miss them when checking out from the "official" source.

-- Best regards, Max Semenik ([[User:MaxSem]])

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Siebrand Mazeland

11:11 a.m.

On 22-03-11 16:38 Trevor Parscal tparscal@wikimedia.org wrote:

...

My suggestion is that all of this "busy" work is highly automatable, but I'm sure he has a greater ability to assess the complexities of this work than I do.

In general I feel that we should be thinking about "how would we make this work" instead of "why should we not do this".

IMO that a bridge too far. My question is "Why should we make this happen?", and more specifically, what do our various stakeholders (which groups?) gain or lose in case MediaWiki development would shift from Subversion to Git? Only if the gain in the analysis would be greater than the loss, it makes sense to me look look further into a move to Git.

On 22-03-11 16:08 Trevor Parscal tparscal@wikimedia.org wrote:

...

Your objections seem to be based on the assumption that you would need to have push access to all repositories, but I think that's the point of DCVS, you can just fork them, and then people can pull your changes in themselves (or using a tool). Pull requests could even be generated when things are out of sync.

Yes, sir, indeed. Getting the L10n updates (as well as the i18n updates) into the code as soon as possible is of paramount importance to the success MediaWiki has in its i18n and L10n efforts. Having to wait until possibly active repo maintainers pull updates is unacceptable. This would kill translator motivation, and take us back years.

Siebrand

Chad

11:33 a.m.

On Tue, Mar 22, 2011 at 12:11 PM, Siebrand Mazeland s.mazeland@xs4all.nl wrote:

...

On 22-03-11 16:38 Trevor Parscal tparscal@wikimedia.org wrote:

...
My suggestion is that all of this "busy" work is highly automatable, but I'm sure he has a greater ability to assess the complexities of this work than I do.

In general I feel that we should be thinking about "how would we make this work" instead of "why should we not do this".

IMO that a bridge too far. My question is "Why should we make this happen?", and more specifically, what do our various stakeholders (which groups?) gain or lose in case MediaWiki development would shift from Subversion to Git? Only if the gain in the analysis would be greater than the loss, it makes sense to me look look further into a move to Git.

I'd like to really underline this point here. I am *not* sold on using Git, and I think it's premature to assume that we have to make Git work. Siebrand has raised some valid concerns that I think bear considering, even if the process is (semi)automated. Also, the comment about code review is also a point. Right now, CodeReview does not support Git, and really the implementation was never built with Git in mind. I think we could hack it in, but it wouldn't be pretty and if Git's the answer then I think we'll be leaving this tool in favor of something else (I and many others like Gerrit quite a bit). I think migrating our repository is a huge task--one I think should be done slowly, with caution, and a very clear exit path if things go wrong. The status quo isn't great, but we can live with it if Git doesn't pan out how we'd like.

That all being said, I'd like to propose again that we only seek to move phase3 at this time. It's a much smaller chunk of the rest of the repo and is pretty self-contained. It'd give us a chance to work with a smaller dataset both for the metadata rewriting as well as getting *used* to the workflow. We've all used Git before, but every organization's workflow is a little different. Once we spend awhile doing that, I think we'll be in a much better position to evaluate whether we'd like to move extensions over or not.

-Chad

Happy-melon

7:27 p.m.

"Chad" innocentkiller@gmail.com wrote in message news:AANLkTikrRE_3O+pYCjdX2+QiL6zT3tPohQuCodwE_SrU@mail.gmail.com...

...

On Tue, Mar 22, 2011 at 12:11 PM, Siebrand Mazeland s.mazeland@xs4all.nl wrote:

Also, the comment about code review is also a point. Right now, CodeReview does not support Git, and really the implementation was never built with Git in mind. I think we could hack it in, but it wouldn't be pretty and if Git's the answer then I think we'll be leaving this tool in favor of something else (I and many others like Gerrit quite a bit).

To my mind, this is one of the most important points. We have built up a very comprehensive infrastructure for code review in SVN, and there is a lot of manhours behind that work; and just as many hours associated with setting up a replacement system for git. Who is going to put that infrastructure in place, in amongst all the many other priorities we have at the moment? Moving to a VCS which makes it "easier to review stuff" **in principle** is going to be of no use whatsoever if it sends the **practical implementation** of that review process back to the stone age.

--HM

Rob Lanphier

8:05 p.m.

On Tue, Mar 22, 2011 at 5:27 PM, Happy-melon happy-melon@live.com wrote:

...

To my mind, this is one of the most important points. We have built up a very comprehensive infrastructure for code review in SVN, and there is a lot of manhours behind that work; and just as many hours associated with setting up a replacement system for git. Who is going to put that infrastructure in place, in amongst all the many other priorities we have at the moment? Moving to a VCS which makes it "easier to review stuff" **in principle** is going to be of no use whatsoever if it sends the **practical implementation** of that review process back to the stone age.

I've been thinking about this problem quite a bit, and I agree that it's a large problem. However, one thing that I think is very important for us to keep in mind. Let's say that we were starting from scratch building a new project, and we had, on the one hand Subversion+Code Review, and on the other Git+some other alternative. I'm going to bet that most people would recommend Git+some other alternative. Our code review tool is pretty nice, but we can't let it be the tail that wags the dog.

If our code review system was working smoothly, I wouldn't mind delaying this. However, it's pretty clear that code reviews aren't keeping pace (be sure to look at revisions marked "new" in trunk): http://toolserver.org/~robla/crstats/crstats.trunkall.html

I believe that once the reviewers get the hang of Git, they'll be more efficient, and be more capable of keeping up. I think paired with Neil's proposal[1] that we switch to pre-commit reviews, and we might actually be able to get back on a regular release cycle.

Rob

[1] Neils proposal: http://lists.wikimedia.org/pipermail/wikitech-l/2011-March/052037.html

Tim Starling

10:16 p.m.

On 23/03/11 12:05, Rob Lanphier wrote:

...

If our code review system was working smoothly, I wouldn't mind delaying this. However, it's pretty clear that code reviews aren't keeping pace (be sure to look at revisions marked "new" in trunk): http://toolserver.org/~robla/crstats/crstats.trunkall.html

I believe that once the reviewers get the hang of Git, they'll be more efficient, and be more capable of keeping up. I think paired with Neil's proposal[1] that we switch to pre-commit reviews, and we might actually be able to get back on a regular release cycle.

What proportion of a reviewer's time do you suppose is spent battling with Subversion? I thought most of it was just spent reading code.

If you want someone to dig a hole faster, you don't buy them a nicer-looking shovel. I think we have to look at the benefits of Git carefully, and to weigh it against the costs, both of conversion and ongoing.

I think our focus at the moment should be on deployment of extensions and core features from the 1.17 branch to Wikimedia. We have heard on several occasions that it is the delay between code commit and deployment, and the difficulty in getting things deployed, which is disheartening for developers who come to us from the Wikimedia community. I'm not so concerned about the backlog of trunk reviews. We cleared it before, so we can clear it again.

-- Tim Starling

Bryan Tong Minh

23 Mar 23 Mar

3:30 a.m.

On Wed, Mar 23, 2011 at 4:16 AM, Tim Starling tstarling@wikimedia.org wrote:

...

On 23/03/11 12:05, Rob Lanphier wrote:

...
If our code review system was working smoothly, I wouldn't mind delaying this. However, it's pretty clear that code reviews aren't keeping pace (be sure to look at revisions marked "new" in trunk): http://toolserver.org/~robla/crstats/crstats.trunkall.html

I believe that once the reviewers get the hang of Git, they'll be more efficient, and be more capable of keeping up. I think paired with Neil's proposal[1] that we switch to pre-commit reviews, and we might actually be able to get back on a regular release cycle.

What proportion of a reviewer's time do you suppose is spent battling with Subversion? I thought most of it was just spent reading code.

If you want someone to dig a hole faster, you don't buy them a nicer-looking shovel. I think we have to look at the benefits of Git carefully, and to weigh it against the costs, both of conversion and ongoing.

I think our focus at the moment should be on deployment of extensions and core features from the 1.17 branch to Wikimedia. We have heard on several occasions that it is the delay between code commit and deployment, and the difficulty in getting things deployed, which is disheartening for developers who come to us from the Wikimedia community. I'm not so concerned about the backlog of trunk reviews. We cleared it before, so we can clear it again.

-- Tim Starling

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Roan Kattouw

4:52 a.m.

2011/3/23 Tim Starling tstarling@wikimedia.org:

...

I think our focus at the moment should be on deployment of extensions and core features from the 1.17 branch to Wikimedia. We have heard on several occasions that it is the delay between code commit and deployment, and the difficulty in getting things deployed, which is disheartening for developers who come to us from the Wikimedia community. I'm not so concerned about the backlog of trunk reviews. We cleared it before, so we can clear it again.

This post has convinced me that planning a move to Git at this time would be premature. I still believe it would be better than SVN, but as Tim points out we have much more serious issues to address first.

I disagree, however, that the backlog of trunk reviews is not concerning. It means that we still haven't come up with a good process for reliable and quick (as in time between commit and review) code review. I have briefly told the list about my ideas for this in the past, maybe I should revive that thread some time. I also believe that, once we have a process where every commit is reviewed within a reasonable timespan (ideally at most a week), getting deployment closer to trunk and getting it to stay there will actually be fairly easy to do.

We used to have such a process once: a few years ago, Brion used to spend every Monday catching up on code review, reverting broken things, and deploying something resembling HEAD. At some point this broke down due to Brion's lack of scalability, and in the years that have passed our SVN activity has grown to a level where I don't believe code review is a one-person, one-day-a-week job any more. But that's not necessarily a problem: we can scale it by adding people and giving them the time and management they need. But that wasn't really ever done in a permanent fashion.

Roan Kattouw (Catrope)

P.S.: The final paragraph is not meant to suggest that Brion was the one and only code reviewer back in those days. Other people also reviewed code, and I don't mean to marginalize their contributions, I just wanted to point out that Brion was the driving force behind regular review and deployment happening.

Marcin Cieslak

7:22 a.m.

...

...
Roan Kattouw roan.kattouw@gmail.com wrote:

2011/3/23 Tim Starling tstarling@wikimedia.org:

...

I disagree, however, that the backlog of trunk reviews is not concerning. It means that we still haven't come up with a good process for reliable and quick (as in time between commit and review) code review. I have briefly told the list about my ideas for this in the past, maybe I should revive that thread some time. I also believe that, once we have a process where every commit is reviewed within a reasonable timespan (ideally at most a week), getting deployment closer to trunk and getting it to stay there will actually be fairly easy to do.

One of the ways to improve this is to assign mentors to new committers. Mentors don't have to be necessarily related to the particular area of committers work. The question is how many current developers would have time to accommodate "newbies", but maybe we can work towards this idea?

//Marcin (saper)

Roan Kattouw

7:28 a.m.

2011/3/23 Marcin Cieslak saper@saper.info:

...

One of the ways to improve this is to assign mentors to new committers. Mentors don't have to be necessarily related to the particular area of committers work. The question is how many current developers would have time to accommodate "newbies", but maybe we can work towards this idea?

I was personally going for something more like assigning reviewers to MW components, paths, time slots, or some combination thereof. But if we're gonna have that discussion for real, let's have it in its on thread :)

Roan Kattouw (Catrope)

Marcin Cieslak

8:36 a.m.

New subject: Improving code review: Mentors/maintainers? (was: Re: Converting to Git?)

...

...
Roan Kattouw roan.kattouw@gmail.com wrote:

2011/3/23 Marcin Cieslak saper@saper.info:

...
One of the ways to improve this is to assign mentors to new committers. Mentors don't have to be necessarily related to the particular area of committers work. The question is how many current developers would have time to accommodate "newbies", but maybe we can work towards this idea?

I was personally going for something more like assigning reviewers to MW components, paths, time slots, or some combination thereof. But if we're gonna have that discussion for real, let's have it in its on thread :)

Yea, right, sorry for hijack. Changed subject for now. I am not sure that any formal assignement will work esp. taking volunteers into account.

I prefer to do "svn log" and see who made last few meaningful commits. In some areas it will be easy - like in the api I am most likely to see "catrope" or "reedy", but there are some forgotten swamps (like some extensions) where only raymond and firends push localization updates.

Obviously, I consider it a good practice to talk to someone touch this code earlier (as I did with CentralNotice and I didn't with API). Maybe we lack some forum for this pre-commit exchange (I think some open forum is better than private exchange on mail or even IRC channel). But sometimes things are too trivial to bother *the* mailing list (and then, wikimedia-tech seems to be more likely place since it became more developer-oriented than mediawiki-l).

Just to give a not-so-hypothetical example, since I don't like discussing in vain, what about this:

Is this okay to fix https://bugzilla.wikimedia.org/show_bug.cgi?id=16260 by adding a new [[Message:qsidebar]] that is the same as [[Message:Sidebar]] only accepts EDIT, THISPAGE, CONTEXT, MYPAGES, SPECIALPAGES, TOOLBOX boxes?

I see that hartman and dartman did some work there recently, and ashley one clean up about a year ago.

//Marcin

Daniel Friesen

9:14 a.m.

New subject: Improving code review: Mentors/maintainers?

On 11-03-23 06:36 AM, Marcin Cieslak wrote:

...

[...] Just to give a not-so-hypothetical example, since I don't like discussing in vain, what about this:

Is this okay to fix https://bugzilla.wikimedia.org/show_bug.cgi?id=16260 by adding a new [[Message:qsidebar]] that is the same as [[Message:Sidebar]] only accepts EDIT, THISPAGE, CONTEXT, MYPAGES, SPECIALPAGES, TOOLBOX boxes?

I see that hartman and dartman did some work there recently, and ashley one clean up about a year ago.

//Marcin

I'd actually like to eliminate legacy skins altogether. They show up and throw a thorn into skin improvements repeatedly. Perhaps after coming up with some replacements that fit the requirements of the people that actually use them. I haven't got any statistics yet on how many active users are actually using these skins.

The fact that we'd discuss a bug like that not because CologneBlue and the other legacy skins have a unique feature or layout they were created to offer and we're trying to improve it, but because CologneBlue and the other legacy skins lay out the ui in a pre-SkinTemplate era way of layout we're maintaining and trying to backport modern improvements into is a good example of why I want to get rid of them. It's like trying to backport modern features to REL1_11.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Jack Phoenix

24 Mar 24 Mar

11:02 a.m.

New subject: Improving code review: Mentors/maintainers?

On Wed, Mar 23, 2011 at 4:14 PM, Daniel Friesen lists@nadir-seen-fire.comwrote:

...

On 11-03-23 06:36 AM, Marcin Cieslak wrote:

...
[...] Just to give a not-so-hypothetical example, since I don't like discussing in vain, what about this:

Is this okay to fix

https://bugzilla.wikimedia.org/show_bug.cgi?id=16260

...
by adding a new [[Message:qsidebar]] that is the same as

[[Message:Sidebar]]

...
only accepts EDIT, THISPAGE, CONTEXT, MYPAGES, SPECIALPAGES, TOOLBOX

boxes?

...
I see that hartman and dartman did some work there recently, and ashley one clean up about a year ago.

//Marcin

I'd actually like to eliminate legacy skins altogether. They show up and throw a thorn into skin improvements repeatedly.

Thanks and regards, -- Jack Phoenix MediaWiki developer

Neil Kandalgaonkar

23 Mar 23 Mar

4:07 p.m.

On 3/22/11 6:05 PM, Rob Lanphier wrote:

...

Our code review tool is pretty nice, but we can't let it be the tail that wags the dog.

At the risk of being impolite -- our code review tool is not that nice. (I don't expect that anyone who worked on it would even disagree with me here.)

It happens to be our home grown tool, and it uses a framework that more of us are familiar with. But it's not such an overwhelming asset that we should consider staying on SVN because of it. In 2011 there are lots of code review frameworks out there to choose from.

...

I believe that once the reviewers get the hang of Git, they'll be more efficient, and be more capable of keeping up. I think paired with Neil's proposal[1] that we switch to pre-commit reviews, and we might actually be able to get back on a regular release cycle.

I have to confess this is my main interest in at least re-examining our source control situation. Git doesn't necessarily make pre-commit code review easier, but as a side effect it will allow us to consider other options.

If you don't believe me about pre-commit code review, ask any of your friends who work for (or who have worked for) Google. Even people who were very skeptical will usually say that it is has been a huge benefit.

-- Neil Kandalgaonkar (| neilk@wikimedia.org

Robert Leverington

4:13 p.m.

On 2011-03-23, Neil Kandalgaonkar wrote:

...

On 3/22/11 6:05 PM, Rob Lanphier wrote:

...
Our code review tool is pretty nice, but we can't let it be the tail that wags the dog.

At the risk of being impolite -- our code review tool is not that nice. (I don't expect that anyone who worked on it would even disagree with me here.)

It happens to be our home grown tool, and it uses a framework that more of us are familiar with. But it's not such an overwhelming asset that we should consider staying on SVN because of it. In 2011 there are lots of code review frameworks out there to choose from.

There's nothing stopping a Git backend being created for the code review extension.

Robert

Neil Kandalgaonkar

4:19 p.m.

On 3/23/11 2:13 PM, Robert Leverington wrote:

...

On 2011-03-23, Neil Kandalgaonkar wrote:

...
On 3/22/11 6:05 PM, Rob Lanphier wrote:

...
Our code review tool is pretty nice, but we can't let it be the tail that wags the dog.

At the risk of being impolite -- our code review tool is not that nice. (I don't expect that anyone who worked on it would even disagree with me here.)

It happens to be our home grown tool, and it uses a framework that more of us are familiar with. But it's not such an overwhelming asset that we should consider staying on SVN because of it. In 2011 there are lots of code review frameworks out there to choose from.

There's nothing stopping a Git backend being created for the code review extension.

Perhaps I was unclear. My point was that I personally would like to *not* use our current code review system.

By that I mean both the general paradigm we have, and the tools we are using.

I'm asserting that:

- There are other software frameworks for code review with more and better features.

- There are other code review paradigms that are better for team health, other than "let's spend several months on the backlog every time we want to release".

-- Neil Kandalgaonkar (| neilk@wikimedia.org

Roan Kattouw

4:35 p.m.

2011/3/23 Neil Kandalgaonkar neilk@wikimedia.org:

...

There are other software frameworks for code review with more and

better features.

Yes, we should look at existing stuff for Git.

...

There are other code review paradigms that are better for team health,

other than "let's spend several months on the backlog every time we want to release".

Without meaning to put words in your mouth, I would also like to point out that some of these better paradigms can be done with SVN just fine. Maybe we will end up with an even better paradigm thanks to Git (I personally think we will), but SVN is perfectly capable of supporting something better than the ad-hoc non-paradigm we have now.

But more on that in its own thread, in its own time. I'd start a discussion about this if I weren't so busy right now.

Roan Kattouw (Catrope)

K. Peachey

4:47 p.m.

On Thu, Mar 24, 2011 at 7:07 AM, Neil Kandalgaonkar neilk@wikimedia.org wrote:

...

It happens to be our home grown tool, and it uses a framework that more of us are familiar with. But it's not such an overwhelming asset that we should consider staying on SVN because of it. In 2011 there are lots of code review frameworks out there to choose from.

I don't believe anyone said it had to a home grown solution, but rather, that we needed a soultion that work before any transfer could really take place, Perhaps some of theses CR frameworks could be listed on the git pages on wiki.

On Thu, Mar 24, 2011 at 7:13 AM, Robert Leverington robert@rhl.me.uk wrote:

...

There's nothing stopping a Git backend being created for the code review extension.

There is a bug for that already in bugzilla!

On Thu, Mar 24, 2011 at 7:19 AM, Neil Kandalgaonkar neilk@wikimedia.org wrote:

...

There are other code review paradigms that are better for team health,

other than "let's spend several months on the backlog every time we want to release".

I'm sure that discussion is always welcome, but perhaps that should be discussed in a non SVN vs <insert (D)VCS of choice> thread. No matter what, we will always end up with a backlog of unreviewed code because so few people people that actually spend their time doing it unless they are dragged over from their projects to do it.

Platonides

7:47 p.m.

New subject: Code Review tools (was: Converting to Git?)

I'd prefer if those superb review tools were named instead of vague references about greener pastures and how wonderful it will be reviewing code with git.

And no, nobody wants our review paradigm to be "let's spend several months on the backlog every time we want to release". It was just the best we managed to afford.

Chad

7:48 p.m.

New subject: Code Review tools (was: Converting to Git?)

The only one I know and like is Gerrit.

-Chad On Mar 23, 2011 8:43 PM, "Platonides" Platonides@gmail.com wrote:

...

I'd prefer if those superb review tools were named instead of vague references about greener pastures and how wonderful it will be reviewing code with git.

And no, nobody wants our review paradigm to be "let's spend several months on the backlog every time we want to release". It was just the best we managed to afford.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

mhershberger＠wikimedia.org

25 Mar 25 Mar

10:56 p.m.

New subject: Code Review tools

Platonides Platonides@gmail.com writes:

...

And no, nobody wants our review paradigm to be "let's spend several months on the backlog every time we want to release". It was just the best we managed to afford.

We've been doing a little better for the past month, but Robla's chart[1] is still looking ugly.

So, other than switching to the mythical GIT, where all is rainbows and roses, what can we do to improve code review now?

As far as I can see, the main reason that people think code review works better under GIT is because the committer is responsible for getting xyr[2] code reviewed *before* it is merged. The committer is motivated to find get xyr code reviewed because if xe doesn't, the code won't be used, and others will not experience its beauty.

Subversion doesn't support this review-first model. Not unless we set up another branch that only took reviewed code. But then, we're back in the same boat. Most people would run from trunk and the committer knows that xyrs code will be used by a lot of people and extensions will probably be developed that depend upon it, etc.

So, while Subversion doesn't support review-first, it can incorporate revert-later. We can even use our current CodeReview tool. We just need to be more aggressive reverting unreviewed code.

And just to be clear, there would be a not-too-distant “later”. I propose a week.

If code is to survive past a week in the repository, it has to be reviewed.

If you want to make a commit that depends on un-reviewed code, you have to find someone to review it. Otherwise, your commit will break trunk when that code is reverted.

FIXMEs would disappear. FIXMEs would be up for reversion almost immediately. Give the committer a day to fix the code, but if it survives 24 hours as a FIXME, it gets reverted.

I suggest we implement this ASAP. If we start this policy on April 4th, we would be doing the first round of reverts April 11th. We should grandfather in the current code, of course. It would be exempt from grim reversion reaper, but it should still be reviewed.

This solution would mean pain, but I think it would be manageable pain. And it would be more workable than the changing the vcs that the twn people have to work with.

Thoughts?

Mark.

Footnotes: [1] http://toolserver.org/~robla/crstats/crstats.html — the problem is easiest to see if you unclick “ok”. Then you'll see the red “new” line is creeping up again.

[2] http://en.wikipedia.org/wiki/Gender-neutral_pronoun, equivalent to “his or her” but only jarring (till you get used to it) and not cumbersome.

Chad

11:23 p.m.

New subject: Code Review tools

On Fri, Mar 25, 2011 at 11:56 PM, Mark A. Hershberger mhershberger@wikimedia.org wrote:

...

I suggest we implement this ASAP. If we start this policy on April 4th, we would be doing the first round of reverts April 11th. We should grandfather in the current code, of course. It would be exempt from grim reversion reaper, but it should still be reviewed.

I see no reason to grandfather in current code. TBH, the list of fixmes is appalling and we should make a sprint at saying "fix this or it'll be reverted in 24 hours" for every last one of them.

-Chad

Daniel Friesen

11:48 p.m.

New subject: Code Review tools

On 11-03-25 09:23 PM, Chad wrote:

...

On Fri, Mar 25, 2011 at 11:56 PM, Mark A. Hershberger mhershberger@wikimedia.org wrote:

...
I suggest we implement this ASAP. If we start this policy on April 4th, we would be doing the first round of reverts April 11th. We should grandfather in the current code, of course. It would be exempt from grim reversion reaper, but it should still be reviewed.

I see no reason to grandfather in current code. TBH, the list of fixmes is appalling and we should make a sprint at saying "fix this or it'll be reverted in 24 hours" for every last one of them.

-Chad

"All" the fixmes?

What about the fixmes left open since it's not clear if anything is even still broken currently. The fixmes for things like extra things like new tests should be added, but the actual commit in question isn't broken in any way. The fixmes for things which are perfectly functional, but need a minor bit of tweaking since they work perfectly find, but don't use the best practice methods to do it.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Ashar Voultoiz

26 Mar 26 Mar

1:57 a.m.

New subject: Code Review tools

On 26/03/11 05:48, Daniel Friesen wrote:

...

What about the fixmes left open since it's not clear if anything is even still broken currently.

If it is unclear: it either need a clarification or deserve a reversion. We already have enough lines hiding in the fog, read to jump at you when you get out of the path.

...

The fixmes for things like extra things like new tests should be added, but the actual commit in question isn't broken in any way.

The fixmes for things which are perfectly functional, but need a minor bit of tweaking since they work perfectly find, but don't use the best practice methods to do it.

Do we even have fixmes for the last two cases? Anyway for tests, they might be required just to make sure other developers using the feature will use it as intended. There are always funny corner cases to handle, specially with PHP.

-- Ashar Voultoiz

Daniel Friesen

11:01 a.m.

New subject: Code Review tools

On 11-03-25 11:57 PM, Ashar Voultoiz wrote:

...

On 26/03/11 05:48, Daniel Friesen wrote:

...
What about the fixmes left open since it's not clear if anything is even still broken currently.

If it is unclear: it either need a clarification or deserve a reversion. We already have enough lines hiding in the fog, read to jump at you when you get out of the path.

...
The fixmes for things like extra things like new tests should be added, but the actual commit in question isn't broken in any way.

The fixmes for things which are perfectly functional, but need a minor bit of tweaking since they work perfectly find, but don't use the best practice methods to do it.

Do we even have fixmes for the last two cases? Anyway for tests, they might be required just to make sure other developers using the feature will use it as intended. There are always funny corner cases to handle, specially with PHP.

I pretty much described all my commits with a fixme tagged on them:

http://www.mediawiki.org/wiki/Special:Code/MediaWiki/81928 Waiting for me to have some time to turn uses of echo into $this->output so that the built in --quiet will work, instead of my own custom implementation of --quiet (I didn't know ->output existed).

http://www.mediawiki.org/wiki/Special:Code/MediaWiki/80248 Comment gives a Tesla link saying something broke. However the Tesla link does not identify that commit as the guaranteed commit that actually broke code. The commit was followed up with several fixmes already and it's unknown if the breakage is still present. The commit is potentially perfectly functional, hit by Tesla catching a completely unrelated commit, or marking a bug that's already fixed.

http://www.mediawiki.org/wiki/Special:Code/MediaWiki/79639 Perfectly functional, just waiting for me to have time to add a small parser test for the behavior.

http://www.mediawiki.org/wiki/Special:Code/MediaWiki/79433 Of all my fixmes this one is the most bug like... that being said, it's an if() anyone could add I haven't had time to do.

http://www.mediawiki.org/wiki/Special:Code/MediaWiki/79383 The commit is perfectly functional, SkinTemplateNavigation and SkinTemplateTabs existed before and after the commit, I just replaced SkinTemplateTabs with SkinTemplateNavigation. The fixme is for the fact that Legacy skins are still using a hack that uses SkinTemplateTabs that also needs to be updated... which, to be honest isn't a good reason to revert a commit, it's pretty much orthogonal to the functionality of the commit.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Platonides

8:33 p.m.

New subject: Code Review tools

Daniel Friesen wrote:

...

http://www.mediawiki.org/wiki/Special:Code/MediaWiki/80248 Comment gives a Tesla link saying something broke. However the Tesla link does not identify that commit as the guaranteed commit that actually broke code. The commit was followed up with several fixmes already and it's unknown if the breakage is still present. The commit is potentially perfectly functional, hit by Tesla catching a completely unrelated commit, or marking a bug that's already fixed.

Come on. It is easy enough to check if your revision is the culprit.

svn up -r r80247 cd tests/phpunit/ make noparser

There was 1 failure:

1) NewParserTest::testParserTests Bad images - basic functionality (failed: Bad images - basic functionality

There were 9 incomplete tests:

1) ApiUploadTest::testUpload RandomImageGenerator: dictionary file not found or not specified properly

2) ApiUploadTest::testUploadSameFileName RandomImageGenerator: dictionary file not found or not specified properly

3) ApiUploadTest::testUploadSameContent RandomImageGenerator: dictionary file not found or not specified properly

4) ApiUploadTest::testUploadStash RandomImageGenerator: dictionary file not found or not specified properly

5) ApiTest::testApiGotCookie The server can't do external HTTP requests, and the internal one won't give cookies

6) ApiWatchTest::testWatchEdit Broken

7) ApiWatchTest::testWatchProtect Broken

8) ApiWatchTest::testWatchRollback Only one author to 'UTPage', cannot test rollback

9) ApiWatchTest::testWatchDelete Broken

There were 2 skipped tests:

1) ApiTest::testApiListPages This test depends on "ApiTest::testApiGotCookie" to pass.

2) ApiWatchTest::testWatchClear This test depends on "ApiWatchTest::testWatchEdit" to pass.

cd ../.. svn up -r r80248 cd tests/phpunit/ make noparser

There were 2 errors:

1) ApiBlockTest::testMakeNormalBlock htmlspecialchars() expects parameter 1 to be string, object given

2) NewParserTest::testFuzzTests MWException: Out of memory:

There were 3 failures:

1) TitlePermissionTest::testQuickPermissions Failed asserting that two arrays are equal. --- Expected +++ Actual @@ @@ Array ( [0] => Array ( [0] => badaccess-groups - [1] => *, [[Local:Users|Users]] + [1] => *, Users [2] => 2 )

)

2) TitlePermissionTest::testPageRestrictions Failed asserting that two arrays are equal. --- Expected +++ Actual @@ @@ Array ( [0] => Array ( [0] => badaccess-groups - [1] => *, [[Local:Users|Users]] + [1] => *, Users [2] => 2 )

[1] => Array ( [0] => protectedpagetext [1] => bogus )

[2] => Array ( [0] => protectedpagetext [1] => protect )

[3] => Array ( [0] => protectedpagetext [1] => protect )

)

3) NewParserTest::testParserTests Bad images - basic functionality (failed: Bad images - basic functionality Failed asserting that <text> is equal to string:. Bad images - basic functionality) Failed asserting that boolean:false is true.

So r80248 did break three tests.

cd ../.. svn up cd tests/phpunit

php phpunit.php includes/api/ApiBlockTest.php OK (1 test, 4 assertions)

php phpunit.php includes/TitlePermissionTest.php There was 1 failure:

1) TitlePermissionTest::testUserBlock Failed asserting that two arrays are equal.

This is a different test, which expects infinite and now returns a Message Object.

The problem was fixed in trunk.

Bryan Tong Minh

27 Mar 27 Mar

5:41 a.m.

New subject: Code Review tools

On Sun, Mar 27, 2011 at 3:33 AM, Platonides Platonides@gmail.com wrote:

...

Daniel Friesen wrote:

...
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/80248 Comment gives a Tesla link saying something broke. However the Tesla link does not identify that commit as the guaranteed commit that actually broke code. The commit was followed up with several fixmes already and it's unknown if the breakage is still present. The commit is potentially perfectly functional, hit by Tesla catching a completely unrelated commit, or marking a bug that's already fixed.

Come on. It is easy enough to check if your revision is the culprit.

svn up -r r80247 cd tests/phpunit/ make noparser

Which takes approximately one hour to run. We should fix this, because otherwise nobody is going to run the unit tests before committing something.

Bryan

Platonides

11:21 a.m.

New subject: Code Review tools

Bryan Tong Minh wrote:

...

On Sun, Mar 27, 2011 at 3:33 AM, Platonides wrote:

...
Come on. It is easy enough to check if your revision is the culprit.

svn up -r r80247 cd tests/phpunit/ make noparser

Which takes approximately one hour to run. We should fix this, because otherwise nobody is going to run the unit tests before committing something.

Bryan

$ time make noparser Tests: 823, Assertions: 9512, Failures: 8, Incomplete: 42, Skipped: 3. make: *** [noparser] Error 1

real 0m45.697s user 0m10.389s sys 0m1.523s

I have mysql tmpdir set to a tmpfs filesystem (mysql doesn't support in-memory tables with BLOBs). Using a different hardware, a cold cache and creating the temporary tables on disk, it may take a few minutes, but not an hour.

On the other hand, running phpunit parser tests can take that long. Whereas the good old parserTests.php takes ~44s, too. All the other time is db overhead droping and duplicating tables, inserting articles and waiting for the db answer. I tested performing a new mysql connection instead of dropping each table separatedly, but it was slower. A change that could improve perfomance would be to insert everything on a main temporary table, and clone that with its content for each parser test. Or we could try to remove the db dependency altogether for parser tests.

Roan Kattouw

26 Mar 26 Mar

5:37 a.m.

New subject: Code Review tools

2011/3/26 Mark A. Hershberger mhershberger@wikimedia.org:

...

If code is to survive past a week in the repository, it has to be reviewed.

This is basically what I suggested in the other thread, except I added a few other conditions that have to be satisfied before we can start using such a paradigm (relating to reviewer allocation, discipline and assignment).

Roan Kattouw (Catrope)

Platonides

8:39 p.m.

New subject: Code Review tools

Roan Kattouw wrote:

...

2011/3/26 Mark A. Hershberger mhershberger@wikimedia.org:

...
If code is to survive past a week in the repository, it has to be reviewed.

This is basically what I suggested in the other thread, except I added a few other conditions that have to be satisfied before we can start using such a paradigm (relating to reviewer allocation, discipline and assignment).

Roan Kattouw (Catrope)

You mentioned reverting broken code.

Mark proposes reverting *unreviewed* code.

We are generally polite by marking fixme the code from others, and avoiding reverting as much as possible. I agree with the proposal of reverting after a few days with an "important fixme". But reverting new revisions because noone reviewed it, seems going too far (at least at this moment).

It would make much more sense to draft some process where you have to review the previous revision of the files you are changing. However, that would forbid fast fixes (eg. fixing the whitespace of the previous commit) without fully reviewing it, which is also undesirable (the revision keeps unreviewed, and with the wrong whitespace).

MZMcBride

10:02 p.m.

New subject: Code Review tools

Roan Kattouw wrote:

...

2011/3/26 Mark A. Hershberger mhershberger@wikimedia.org:

...
If code is to survive past a week in the repository, it has to be reviewed.

This is basically what I suggested in the other thread, except I added a few other conditions that have to be satisfied before we can start using such a paradigm (relating to reviewer allocation, discipline and assignment).

A number of people, for quite some time, have been urging MediaWiki code development to get back to the Brion/Tim style of "revert if broken." I'm certainly among them, so I'm thrilled to see this discussion finally happening. Next step is action. :-)

In addition to the other benefits, more regular reverts will (hopefully) reduce the stigma of being reverted. The wiki model has always encouraged boldness, but it has also equally encouraged the ability to pull back changes as necessary. The tendency to not revert nearly as much made a reversion a much bigger deal, from what I've seen. Even more so (or perhaps exclusively so) when it has involved "paid work" (i.e., work done by Wikimedia Foundation employees/contractors). A move toward more reverts, as long as it doesn't discourage new or old contributors, is definitely the way forward, I think.

MZMcBride

Brion Vibber

27 Mar 27 Mar

1:34 p.m.

New subject: Code Review tools

On Fri, Mar 25, 2011 at 8:56 PM, Mark A. Hershberger < mhershberger@wikimedia.org> wrote:

...

Platonides Platonides@gmail.com writes:

...
And no, nobody wants our review paradigm to be "let's spend several months on the backlog every time we want to release". It was just the best we managed to afford.

We've been doing a little better for the past month, but Robla's chart[1] is still looking ugly.

So, other than switching to the mythical GIT, where all is rainbows and roses, what can we do to improve code review now?

tl;dr summary: The biggest single improvement that can be made is to *ship code faster*.

When new code comes in, there's basically a few things it can do: * break in an obvious and visible way * break in some circumstances, which might not be obvious on first look * mostly work, but be inefficient or have other negative side effects that need fixing * mostly work, but cause HORRIBLE DATA CORRUPTION that's not noticed for some time * work pretty well

Because we're afraid of letting hard-to-find bugs go through, we're holding *everything* back for too long in the hopes that we'll somehow develop the ability to find hard-to-find bugs easily. As a result, the code paths don't get exercised until a giant last-minute review-and-push comes through months later, and finding the actual source of the bugs becomes even *more* difficult because you have 6 months of changes to search all at once instead of a few days.

Ship sooner -> fail faster -> fix quickly. Update the live site no less frequently than weekly; update test sites more frequently than that. Make sure those test sites include things that developers are actually dogfooding. Encourage other testers to run on trunk and report issues.

A smaller, but still relevant issue is to see if we can change how we think about review priorities: something that changes the core of the parser or how pages get saves might well cause HORRIBLE DATA CORRUPTION, but changes in UI code probably won't. Changes in UI code might cause an XSS vulnerability, however... so when thinking about how much attention code needs, we should be considering the module rather than laying down blanket policies.

Some more explicit reviewer module 'ownership' could indeed be helpful -- especially if we have a more explicit review process for 'big changes', but even for less formal review.

As far as I can see, the main reason that people think code review

...

works better under GIT is because the committer is responsible for getting xyr[2] code reviewed *before* it is merged. The committer is motivated to find get xyr code reviewed because if xe doesn't, the code won't be used, and others will not experience its beauty.

That isn't specific to git; the same methodology works in SVN or CVS or whatever where you're reviewing patches submitted through email, bug tracker systems, etc. The advantage git has here is that your intermediate work is easier to keep and share within the revision control system, as opposed to having to keep your work *outside* the version control system until it's been approved by someone else.

IMO that's a big advantage, but you can still do review-first with SVN, and we always have for patches submitted through bugzilla or the mailing list by non-committers.

If review and application of submitted patches can be made consistent and reasonably speedy, that would again be a big improvement without requiring a toolset change: getting more good stuff through, with no danger of it breaking things _before_ approval & merging.

So, while Subversion doesn't support review-first, it can incorporate

...

revert-later. We can even use our current CodeReview tool. We just need to be more aggressive reverting unreviewed code.

And just to be clear, there would be a not-too-distant “later”. I propose a week.

If code is to survive past a week in the repository, it has to be reviewed.

...

If you want to make a commit that depends on un-reviewed code, you have to find someone to review it. Otherwise, your commit will break trunk when that code is reverted.

This is actually a lot harder than it might sound; even in only a week, trimming out dependency on dependency on dependency can be extremely difficult, especially if some change involved lots of giant whitespace cleanup or variable renames or other things that play hell with patch resolution.

Reverting generically questionable code should probably happen a lot faster than after a week.

-- brion

mhershberger＠wikimedia.org

4:30 p.m.

New subject: Code Review tools

Brion Vibber brion@pobox.com writes:

...

On Fri, Mar 25, 2011 at 8:56 PM, Mark A. Hershberger <

...
If you want to make a commit that depends on un-reviewed code, you have to find someone to review it. Otherwise, your commit will break trunk when that code is reverted.

This is actually a lot harder than it might sound; even in only a week, trimming out dependency on dependency on dependency can be extremely difficult, especially if some change involved lots of giant whitespace cleanup or variable renames or other things that play hell with patch resolution.

Reverting generically questionable code should probably happen a lot faster than after a week.

I did suggest that we revert it with-in 24 hours of it being marked FIXME. I'd even be fine with immediate reversion.

You suggest putting up test servers and deploying code quicker. Which I'm all in favor of. TranslateWiki does this somewhat for us, but I think setting up a prototype where this would happen more regularly and with a configuration more like WMF wikis would be a good idea.

Mark.

Tim Starling

28 Mar 28 Mar

7:17 a.m.

New subject: Code Review tools

On 26/03/11 14:56, Mark A. Hershberger wrote:

...

So, other than switching to the mythical GIT, where all is rainbows and roses, what can we do to improve code review now?

It's no mystery. After the 1.17 deployment, the team that was doing code review was disassembled. If you want code review to happen faster, then getting people to work on it would be a good start.

...

If code is to survive past a week in the repository, it has to be reviewed.

If you want to make a commit that depends on un-reviewed code, you have to find someone to review it. Otherwise, your commit will break trunk when that code is reverted.

Find someone to review it? If the experienced developers on the WMF payroll aren't assigned to code review, then under your proposal, the only option for avoiding a revert will be to get someone with no clue about anything to rubber-stamp the code.

However, volunteer developers aren't always the most capable people at navigating bureaucracy. In practice, a lot of people would commit code, have it reverted, and leave.

If the code review manpower is there, we can be friendly and encouraging to our developers, not threaten them with a revert unless they can make at least one developer be their friend within seven days.

The WMF really is central in this, because we have a policy of hiring as many experienced developers as possible from the volunteer community. So that is where the expertise is concentrated.

...

FIXMEs would disappear. FIXMEs would be up for reversion almost immediately. Give the committer a day to fix the code, but if it survives 24 hours as a FIXME, it gets reverted.

By definition, our volunteer developers have lives outside of MediaWiki. We have to fit in with their schedules. I don't think we should give them a kick in the teeth just because they committed something on Sunday and have to go to school on Monday.

If a commit is insecure, or changes interfaces in a way that will be disruptive to other developers, or breaks key functionality, then sure, we should revert it right away. There's no need to wait 24 hours. But I don't think we need to be issuing death sentences for typos in comments.

A "fixme" status just means that there is something wrong with the commit, however minor, it doesn't mean that any urgent action is required.

Your proposal seems to be based on the idea that review under Git is many times better than review with CodeReview and Subversion. I don't think that's true, I think it's very slightly better. Whether you use Git or Subversion, you still need people with brains reading code.

-- Tim Starling

Roan Kattouw

8:20 a.m.

New subject: Code Review tools

2011/3/28 Tim Starling tstarling@wikimedia.org:

...

By definition, our volunteer developers have lives outside of MediaWiki. We have to fit in with their schedules. I don't think we should give them a kick in the teeth just because they committed something on Sunday and have to go to school on Monday.

If a commit is insecure, or changes interfaces in a way that will be disruptive to other developers, or breaks key functionality, then sure, we should revert it right away. There's no need to wait 24 hours. But I don't think we need to be issuing death sentences for typos in comments.

Reverting is a blunt instrument and should only be used when appropriate. I think it's perhaps a bit underused currently, but that doesn't mean we should swing to the other end of the spectrum. Reverting a revision is appropriate if it breaks things or if its presence in the repository causes other problems, like Tim said. Also, if a revision is problematic and can't be fixed quickly, it should be reverted, not left in a fixme state for two weeks. OTOH reverting things for minor issues is needlessly disruptive (not to mention demotivating), and reverting a *volunteer* developer's revision simply because *paid* reviewers (most of them are paid anyway) didn't get around to reviewing it is the kind of dickish behavior that will scare off volunteers very effectively.

Roan Kattouw (Catrope)

MZMcBride

7:26 p.m.

New subject: Code Review tools

Tim Starling wrote:

...

If the code review manpower is there, we can be friendly and encouraging to our developers, not threaten them with a revert unless they can make at least one developer be their friend within seven days.

The WMF really is central in this, because we have a policy of hiring as many experienced developers as possible from the volunteer community. So that is where the expertise is concentrated.

You're one of the most senior developers and you're a Wikimedia employee, yet some of your writing comes off as though you're on the outside. Yes, Wikimedia needs to devote more manpower to code review. This has been pretty apparent for quite some time. The central question at this point seems to be: what's the hold-up?

Long ago I lost track of who's in charge of what, but I'm told there is some sort of hierarchy in place in the tech department. Who's empowered to start assigning people to review code in a reasonable timeframe? Like Aryeh, I find this entire thread a bit baffling.

MZMcBride

Tim Starling

7:48 p.m.

New subject: Code Review tools

On 29/03/11 11:26, MZMcBride wrote:

...

Long ago I lost track of who's in charge of what, but I'm told there is some sort of hierarchy in place in the tech department. Who's empowered to start assigning people to review code in a reasonable timeframe? Like Aryeh, I find this entire thread a bit baffling.

The hierarchy is CTO -> EPMs -> regular plebs like me. The EPMs are Rob Lanphier, CT Woo, Mark Bergsma and Alolita Sharma. General MediaWiki work is mostly Rob Lanphier's responsibility, which is why he's been so active in this thread.

Rob doesn't know as much about MediaWiki as me, but he knows the people who work on it and how to manage them. I think his response with subject "The priority of code review" was entirely reasonable.

I'm not saying that I think MediaWiki code review should be the highest priority task for the Foundation, or that it's important to review all commits within a few days, as Aryeh contends:

Aryeh wrote:

...

If commits are not, as a general rule, consistently reviewed within two or three days, the system is broken. I don't know why this isn't clear to everyone yet.

I'm saying that the Git/Subversion debate is peripheral, and that human factors like assignment of labour and level of motivation are almost entirely responsible for the rate of code review.

In the last week, I've been reviewing extensions that were written years ago, and were never properly looked at. I don't think it's appropriate to measure success in code review solely by the number of "new" revisions after the last branch point.

Code review of self-contained projects becomes easier the longer you leave it. This is because you can avoid reading code which was superseded, and because it becomes possible to read whole files instead of diffs. So maintaining some amount of review backlog means that you can make more efficient use of reviewer time.

Our current system links version control with review. After a developer has done a substantial amount of work, they commit it. That doesn't necessarily mean they want their code looked at at that point, they may just want to make a backup.

It's useful to look at such intermediate code for the purposes of mentoring, but that's not the same sort of task as a review prior to a tarball release or deployment, and it shouldn't have the same priority.

-- Tim Starling

Platonides

29 Mar 29 Mar

2:51 a.m.

New subject: Code Review tools

Tim Starling wrote:

...

In the last week, I've been reviewing extensions that were written years ago, and were never properly looked at. I don't think it's appropriate to measure success in code review solely by the number of "new" revisions after the last branch point.

Code review of self-contained projects becomes easier the longer you leave it. This is because you can avoid reading code which was superseded, and because it becomes possible to read whole files instead of diffs. So maintaining some amount of review backlog means that you can make more efficient use of reviewer time.

I agree. But that only works for extensions since: * They are self-contained * They are relatively small * They are not deployment blockers

And still they are harder to fix months later when the author has moved on (think in the poolcounterd bug).

I don't think that would work as well for core MediaWiki, albeit it may be feasible for not-so-big features with a kill switch. Large commits changing many files would need a branch to be reviewable in a set. However, our problem with branches is that it removes almost all peer review and testing, and merges are likely to introduce subtle bugs. The late review drawbacks are also there.

...

Our current system links version control with review. After a developer has done a substantial amount of work, they commit it. That doesn't necessarily mean they want their code looked at at that point, they may just want to make a backup.

How do you propose to fix it? The committer deferring its own revision? It may be worth making a list of review requests at mediawiki.

mhershberger＠wikimedia.org

28 Mar 28 Mar

8:24 p.m.

New subject: Code Review tools

Tim Starling tstarling@wikimedia.org writes:

...

On 26/03/11 14:56, Mark A. Hershberger wrote:

...
If code is to survive past a week in the repository, it has to be reviewed.

If you want to make a commit that depends on un-reviewed code, you have to find someone to review it. Otherwise, your commit will break trunk when that code is reverted.

Find someone to review it? If the experienced developers on the WMF payroll aren't assigned to code review, then under your proposal, the only option for avoiding a revert will be to get someone with no clue about anything to rubber-stamp the code.

Thanks for pointing out the things I hadn't considered in my suggestions. I was focused on making junior developers motivated to find reviewers, but neglected to thoroughly consider the results of my suggestion.

...

Your proposal seems to be based on the idea that review under Git is many times better than review with CodeReview and Subversion. I don't think that's true, I think it's very slightly better. Whether you use Git or Subversion, you still need people with brains reading code.

To be clear, I don't think Git is vastly superior to any other VCS for getting code review done. I do think that since Gerrit, for example, appears to be more widely used and supported than than MediaWiki's CodeReview extension, that many people come to this conclusion.

I could be wrong. I'm probably am. But I'd like to fix the Code Review problem so that it can no longer be used as an excuse to change our VCS.

Using Subversion has its pluses and minuses. Code review should not be one of them.

Mark.

Aryeh Gregor

9:20 a.m.

New subject: Code Review tools

On Fri, Mar 25, 2011 at 11:56 PM, Mark A. Hershberger mhershberger@wikimedia.org wrote:

...

As far as I can see, the main reason that people think code review works better under GIT is because the committer is responsible for getting xyr[2] code reviewed *before* it is merged. The committer is motivated to find get xyr code reviewed because if xe doesn't, the code won't be used, and others will not experience its beauty.

I don't think that's the right way to put it. In a properly-functioning review-then-commit system, it should be easy to get code reviewed. The advantage of reviewing the code first is that psychologically, it's much easier to say "Fix these minor things and then I'll approve it" than to say "Fix these minor things or else I'll revert it". The first gives positive incentives, while the second gives negative incentives, and people appreciate positive incentives a lot more. In a review-first system, you're going to routinely have reviewers asking that the patch author write better comments or conform to style guidelines or simplify the logic a bit before they give approval -- or even restructure the changes entirely. In a commit-first system, reviewers are going to be reluctant to revert code that works, even if it has some minor deficiencies, so committers have little incentive to fix minor code issues. Code quality suffers as a result.

...

And just to be clear, there would be a not-too-distant “later”. I propose a week.

If code is to survive past a week in the repository, it has to be reviewed.

If you want to make a commit that depends on un-reviewed code, you have to find someone to review it. Otherwise, your commit will break trunk when that code is reverted.

This is a terrible idea. Review needs to be something that everyone is guaranteed to get without effort on their part. You cannot design a code review system on the theory that code authors are supposed to somehow get their code reviewed when no one is formally required or expected to review it. I'm all for giving people incentives to do the right thing, but incentives are pointless if the person being incentivized has no way to do what you're trying to get them to do. Incentives have to be placed on the code *reviewers*, because they're the only ones who can decide to review a given patch. Conveniently, almost all the code reviewers happen to be employed by Wikimedia, so the incentive can be the good old conventional "your boss told you to".

If, as Tim says, Wikimedia developers were un-assigned from code review after the 1.17 deployment, *that* is the problem that needs to be fixed. We need a managerial decision that all relatively experienced developers employed by Wikimedia need to set aside their other work to do as much code review as necessary to keep current. If commits are not, as a general rule, consistently reviewed within two or three days, the system is broken. I don't know why this isn't clear to everyone yet.

MZMcBride

24 Mar 24 Mar

12:47 a.m.

Neil Kandalgaonkar wrote:

...

At the risk of being impolite -- our code review tool is not that nice. (I don't expect that anyone who worked on it would even disagree with me here.)

It's only impolite if you criticize the code review tool without being constructive. What specifically do you not like about the current code review tool? And have you filed bugs about getting these issues addressed?

MZMcBride

Ashar Voultoiz

2:45 a.m.

On 24/03/11 06:47, MZMcBride wrote:

...

It's only impolite if you criticize the code review tool without being constructive. What specifically do you not like about the current code review tool? And have you filed bugs about getting these issues addressed?

I have answered to this message in a new one to create a new thread.

-- Ashar Voultoiz

Rob Lanphier

22 Mar 22 Mar

12:24 p.m.

On Tue, Mar 22, 2011 at 9:11 AM, Siebrand Mazeland s.mazeland@xs4all.nl wrote:

...

IMO that a bridge too far. My question is "Why should we make this happen?", and more specifically, what do our various stakeholders (which groups?) gain or lose in case MediaWiki development would shift from Subversion to Git? Only if the gain in the analysis would be greater than the loss, it makes sense to me look look further into a move to Git.

The short answer is that, in Subversion, merging is difficult and highly error prone. As a result, we're forced into a workflow where things go straight into trunk, and code review must happen on individual revisions rather than branches with bundles of related revisions. This is inefficient for reviewers, which is a very large bottleneck in our current system. Because we can't practically ask reviewers to use branches, we put far too much Neil covers this in more detail in a recent thread.

The most convincing general Subversion->DVCS argument I've read is here: http://hginit.com/00.html

This argument refers to Mercurial, but the same benefits apply to Git.

Rob

Tim Starling

9:46 p.m.

On 23/03/11 04:24, Rob Lanphier wrote:

...

The most convincing general Subversion->DVCS argument I've read is here: http://hginit.com/00.html

This argument refers to Mercurial, but the same benefits apply to Git.

The article seems quite biased.

"Here’s the part where you’re just going to have to take my word for it.

"Mercurial is better than Subversion."

The tone is quite different to one of the first things I read about Mercurial:

"Oops! Mercurial cut off your arm!

"Don't randomly try stuff to see if it'll magically fix it. Remember what you stand to lose, and set down the chainsaw while you still have one good arm."

https://developer.mozilla.org/en/Mercurial_basics

The main argument is that merging is easy so you can branch without the slightest worry. I think this is an exaggeration. Interfaces change, and when they change, developers change all the references to those interfaces in the code which they can see in their working copy. The greater the time difference in the branch points, the more likely it is that your new code will stop working. As the branch point gap grows, merging becomes more a task of understanding the interface changes and rewriting the code, than just repeating the edits and copying in the new code.

I'm not talking about the interfaces between core and extensions, which are reasonably stable. I'm mainly talking mainly about the interfaces which operate within and between core modules. These change all the time. The problem of changing interfaces is most severe when developers are working on different features within the same region of core code.

Doing regular reintegration merges from trunk to development branches doesn't help, it just means that you get the interface changes one at a time, instead of in batches.

Having a short path to trunk means that the maximum amount of code is visible to the developers who are doing the interface changes, so it avoids the duplication of effort that occurs when branch maintainers have to understand and account for every interface change that comes through.

If we split up the extensions directory, each extension having its own repository, then this will discourage developers from updating the extensions in bulk. This affects both interface changes and general code maintenance. I'm sure translatewiki.net can set up a script to do the necessary 400 commits per day, but I'm not sure if every developer who wants to fix unused variables or change a core/extension interface will want to do the same.

I don't know enough about Git to know if these things are really an argument against it. This is just a comment on the ideas in this thread.

-- Tim Starling

Daniel Friesen

23 Mar 23 Mar

5:36 a.m.

On 11-03-22 07:46 PM, Tim Starling wrote:

...

[...] If we split up the extensions directory, each extension having its own repository, then this will discourage developers from updating the extensions in bulk. This affects both interface changes and general code maintenance. I'm sure translatewiki.net can set up a script to do the necessary 400 commits per day, but I'm not sure if every developer who wants to fix unused variables or change a core/extension interface will want to do the same.

[...] -- Tim Starling

This is why rather than building a tw specific tool, or using git submodules, I believe we should create our own generic set of scripts for working with the extension repos all at once. We can either put these scripts inside of a git repo we would have put submodules inside of, or we could make them part of MediaWiki itself so that anyone with a copy of mw and git installed on their server can also use it to download and update extensions (well, perhaps till we create our own scripts for managing extensions).

Doing that should give every normal developer the ability to check out every extension all at once, just what extensions they want, update all the ones they have checked out, and make mass commits for code maintenance. Once we do that, there's no reason that working with 400 git repos has to be any more awkward for anyone than working with 400 svn directories. We can also try to make it play nice with a mix of master and release branched code and make it easy to switch (from my experience svn with different base directories was a bit messy).

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Rob Lanphier

2:41 p.m.

On Tue, Mar 22, 2011 at 7:46 PM, Tim Starling tstarling@wikimedia.org wrote:

...

On 23/03/11 04:24, Rob Lanphier wrote:

...
The most convincing general Subversion->DVCS argument I've read is here: http://hginit.com/00.html

This argument refers to Mercurial, but the same benefits apply to Git.

The article seems quite biased.

That doesn't mean it's wrong, and no one implied it was objective.

...

The tone is quite different to one of the first things I read about Mercurial:

"Oops! Mercurial cut off your arm!

"Don't randomly try stuff to see if it'll magically fix it. Remember what you stand to lose, and set down the chainsaw while you still have one good arm."

https://developer.mozilla.org/en/Mercurial_basics

Those quotes apply to any version control system, or for that matter, any system (randomly trying stuff and praying for magic rarely seems like good advice). The main guidance that relates to SVN vs Mercurial was the fact that Mercurial doesn't leave conflict markers. Git leaves conflict markers just like Subversion. There is also a couple of bits of guidance about Mercurial Queues, which is a very popular add-on to Mercurial that behaves in ways that are pretty specific to Mercurial (and is a chainsaw and a handgun all in one). The Git equivalent is Quilt, but most Git users don't use Quilt, because Git has some core functionality (rebase, ability to delete branches) that makes such an add-on less interesting.

Git rebase is a beast of its own, and there are many arguments pro and con about its use: Pro: http://darwinweb.net/articles/the-case-for-git-rebase Con: http://changelog.complete.org/archives/586-rebase-considered-harmful

Fervent Mercurial advocates are also a fine source of more "con" material for the git rebase option.

We will probably need to adopt some guidelines about the use of rebase assuming we move to Git.

...

The main argument is that merging is easy so you can branch without the slightest worry. I think this is an exaggeration. Interfaces change, and when they change, developers change all the references to those interfaces in the code which they can see in their working copy. The greater the time difference in the branch points, the more likely it is that your new code will stop working. As the branch point gap grows, merging becomes more a task of understanding the interface changes and rewriting the code, than just repeating the edits and copying in the new code.

Yes, merging is hard. Subversion is particularly bad at it. Git and Mercurial are both much, much better. That doesn't mean they are flawless, but they tend to work pretty well.

...

I'm not talking about the interfaces between core and extensions, which are reasonably stable. I'm mainly talking mainly about the interfaces which operate within and between core modules. These change all the time. The problem of changing interfaces is most severe when developers are working on different features within the same region of core code.

I get that there are many unsupported, volatile interfaces between and within components. However, if we're changing things around so much that it really negates the benefit that Git or some other DVCS brings us, then that's another conversation.

...

Doing regular reintegration merges from trunk to development branches doesn't help, it just means that you get the interface changes one at a time, instead of in batches.

Having a short path to trunk means that the maximum amount of code is visible to the developers who are doing the interface changes, so it avoids the duplication of effort that occurs when branch maintainers have to understand and account for every interface change that comes through.

We can still have the rough equivalent of the big happy trunk where everything goes in prior to review.

...

If we split up the extensions directory, each extension having its own repository, then this will discourage developers from updating the extensions in bulk. This affects both interface changes and general code maintenance. I'm sure translatewiki.net can set up a script to do the necessary 400 commits per day, but I'm not sure if every developer who wants to fix unused variables or change a core/extension interface will want to do the same.

On the flip side, I'm not sure if every developer who wants to make a clone of a single extension on Github or Gitorious wants to use up their quota getting the source of every single extension. Being able to have public clone for pushing/pulling on a hosting service is a large benefit of DVCS for our workflow; it means that no one even has to ask us for permission before effectively operating as a normal contributor. It would be unfortunate if we set up our repository in such a way as to deter people from doing this.

We don't need to switch to the one extension per repo model right away, though. We could throw all of the extensions into a single repository at first, and then split it later if we run into this or other similar problems.

...

I don't know enough about Git to know if these things are really an argument against it. This is just a comment on the ideas in this thread.

I'm not so hellbent on deploying Git that I would push a hasty deployment over reasonable, unmitigated objections. It's fair to wait until some time after we're done with 1.17, and wait until we've figured out how this will work with Translatewiki.

Rob

Daniel Friesen

4:10 p.m.

On 11-03-23 12:41 PM, Rob Lanphier wrote:

...

On Tue, Mar 22, 2011 at 7:46 PM, Tim Starlingtstarling@wikimedia.org wrote:

...
I'm not talking about the interfaces between core and extensions, which are reasonably stable. I'm mainly talking mainly about the interfaces which operate within and between core modules. These change all the time. The problem of changing interfaces is most severe when developers are working on different features within the same region of core code.

I get that there are many unsupported, volatile interfaces between and within components. However, if we're changing things around so much that it really negates the benefit that Git or some other DVCS brings us, then that's another conversation.

Branching also isn't the only advantage of a dvcs. If branching causes too much problems, we can just not use it. Personally I don't use branches all that much and work with a dirty working copy, git's staging area and `git gui` repeatedly saves me a LOT of trouble.

I develop on multiple servers, sometimes the prototype for a live site, sometimes a dev instance of trunk. But I always commit from my home computer, I don't trust any server with the privatekey that can commit to Wikimedia's svn. This means I need to transfer my changes from the server to my personal computer before committing. ((And before anyone says anything, there's no way I'm putting private keys on servers operated by a 3rd party, or doing development of things on my local machine -- reconfiguring apache and trying to get packaged apache and mysql to NOT start up on my laptop except when wanted is a pain)) In svn this is a workflow like so: server$ # Edit some code server$ svn up # make sure things are up to date desktop$ svn up # make sure things are up to date desktop$ svn diff | pager # look at the changes that will be pulled desktop$ ssh server "cd /path/to/code; svn diff" | patch -p0 # abuse ssh, svn diff, and patch to pipe changes to the local copy desktop$ scp server:/path/to/code/newfile . # if I added a new file, I need to randomly copy it desktop$ svn diff | pager # take a look at the changes to double check everything is in place desktop$ svn revert [...] # I often work with a dirty working copy with multiple projects mixed together, so I have to revert some unrelated changes or manually list files in svn commit desktop$ # sometimes those changes have two different projects in one file and I have to commit only part of it desktop$ svn diff somefile.php > tmp.patch # I found the easiest fix is putting the changes into a patch desktop$ svn revert somefile.php # erasing all local changes in the file desktop$ nano tmp.patch # editing out the unrelated code changes from the patchfile desktop$ patch -p0 < tmp.patch # and re-applying the changes desktop$ rm tmp.patch desktop$ svn commit [...] # And finally I can commit server$ svn up # update the server, things get /fun/ if you try to pull patches a second time

;) guess what, this is an explanation for half the reason why I commit small changes to svn without testing them.

Now, here's my git workflow (I developed monaco-port in git so I have a matching workflow): server$ # Edit some code server$ git gui # ssh -X, so git's gui works remotely too, it's a lifesaver when partially committing changes desktop$ git pull theserver master # pull the new commit to my local machine desktop$ git push origin master # ;) and now push the change to the public repo

Oh, and for a bonus, I have a script that pulls changes from all the places I have monaco-port, pushes the changes to the public repo, then has all those monaco-ports pull from the public repo to sync changes everywhere.

...

...
If we split up the extensions directory, each extension having its own repository, then this will discourage developers from updating the extensions in bulk. This affects both interface changes and general code maintenance. I'm sure translatewiki.net can set up a script to do the necessary 400 commits per day, but I'm not sure if every developer who wants to fix unused variables or change a core/extension interface will want to do the same.

On the flip side, I'm not sure if every developer who wants to make a clone of a single extension on Github or Gitorious wants to use up their quota getting the source of every single extension. Being able to have public clone for pushing/pulling on a hosting service is a large benefit of DVCS for our workflow; it means that no one even has to ask us for permission before effectively operating as a normal contributor. It would be unfortunate if we set up our repository in such a way as to deter people from doing this.

Here's another good case for extension-as-repository.

We branch extensions on release currently. I don't expect this to change much after a switch to git, we'll probably type in a few quick commands to create a new rel1_XX branch in each extension repo and push (however I do contend that the higher visibility of branches in git might convince a few more extension authors to check extension compatibility with old versions and backport versions that still work more often). Extensions are as much used as-is in both trunk and stable. In other words, we use and develop the same version of an extension whether the phase3 around it is trunk or something like REL1_16. However we do not always develop every extension at once. If every extension is in the same repo we are forced to checkout every extension (I'll skip how some of us like to avoid this in the first place because of potential security issues) and all of those extensions MUST be of the same branch. This means that if you have a REL1_15 phase3, and you want to use some extensions that try to be backwards compatible and others which have a trunk that breaks you are forced to use branched extensions for EVERY extension, and you forgo the features of the extensions that DO try to be compatible (I DO use both branched and trunk extensions on stable non-development). This is even worse when developing. If you are trying to develop extension compatibility for an old release everything MUST be trunk since you can't really develop in the release branch. As a result any other extension you are using that doesn't have 1.15 compatibility will end up causing unrelated fatal errors hampering your compatibility testing.

I'll also contend that backporting commits to release branches for specific extensions where some of those commits include en-masse code style changes and twn commits will be easier in a model where each extension has it's own repo. In a shared repo I have a feeling that the shared nature of multiple extensions being modified in a single commit could cause some conflicts when you try to apply a series of commits to a branch but ONLY want the portions of those commits specific to one extension (in this case one file path) to be comitted.

...

We don't need to switch to the one extension per repo model right away, though. We could throw all of the extensions into a single repository at first, and then split it later if we run into this or other similar problems.

Perhaps this is possible though I think we might want to double check that splitting IS possible. Someone else might have better luck, but I don't remember splitting a git repo into multiple git repos and as a result changing the path of all the files to be an easy thing.

Though I contend that rather than either of those options, the idea of starting out with just phase3 is best. After that when we want to do extensions we can setup the infrastructure that would handle extensions in git and try it out on a few brand new extensions rather than throwing 500+ extensions into the fray. We can also try moving just a few extensions for more experimentation. Experimenting with a few actively developed extensions would be better than throwing hundreds of extensions without many commits in right away.

...

...
I don't know enough about Git to know if these things are really an argument against it. This is just a comment on the ideas in this thread.

I'm not so hellbent on deploying Git that I would push a hasty deployment over reasonable, unmitigated objections. It's fair to wait until some time after we're done with 1.17, and wait until we've figured out how this will work with Translatewiki.

Me neither, though it would help stop my own issues with my own workflow.

However I do believe it would be good to discuss how it would be implemented, and do some proof-of-concept implementation and get things working as a test before scrapping the actual data in it and doing a real deployment after we've tried out the workflow. There are some things to do like figuring out how to do the things on the [[Git conversion]] page. And deciding how to setup the server. Do we keep the model where we need to send request e-mails for commit access. Or do we try using prior-art in git farming, ie: setting up a copy of Giorious for ourself. Some of these things get in the way of others. It's hard to build a tool for TWN when we don't even have extension repos to build them on. We can build a functional prototype before we even decide to deploy.

...

Rob

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Platonides

7:37 p.m.

Daniel Friesen wrote:

...

((And before anyone says anything, there's no way I'm putting private keys on servers operated by a 3rd party, or doing development of things on my local machine -- reconfiguring apache and trying to get packaged apache and mysql to NOT start up on my laptop except when wanted is a pain))

On which OS?

...

In svn this is a workflow like so: server$ # Edit some code server$ svn up # make sure things are up to date desktop$ svn up # make sure things are up to date desktop$ svn diff | pager # look at the changes that will be pulled desktop$ ssh server "cd /path/to/code; svn diff" | patch -p0 # abuse ssh, svn diff, and patch to pipe changes to the local copy desktop$ scp server:/path/to/code/newfile . # if I added a new file, I need to randomly copy it desktop$ svn diff | pager # take a look at the changes to double check everything is in place desktop$ svn revert [...] # I often work with a dirty working copy with multiple projects mixed together, so I have to revert some unrelated changes or manually list files in svn commit desktop$ # sometimes those changes have two different projects in one file and I have to commit only part of it desktop$ svn diff somefile.php > tmp.patch # I found the easiest fix is putting the changes into a patch desktop$ svn revert somefile.php # erasing all local changes in the file desktop$ nano tmp.patch # editing out the unrelated code changes from the patchfile desktop$ patch -p0 < tmp.patch # and re-applying the changes desktop$ rm tmp.patch desktop$ svn commit [...] # And finally I can commit server$ svn up # update the server, things get /fun/ if you try to pull patches a second time

;) guess what, this is an explanation for half the reason why I commit small changes to svn without testing them.

Seems easier to keep home at the same version as the server, rsync from server to home, and then update before committing.

...

Here's another good case for extension-as-repository.

(...)

...

Extensions are as much used as-is in both trunk and stable. In other words, we use and develop the same version of an extension whether the phase3 around it is trunk or something like REL1_16. However we do not always develop every extension at once. If every extension is in the same repo we are forced to checkout every extension (...) and all of those extensions MUST be of the same branch. This means that if you have a REL1_15 phase3, and you want to use some extensions that try to be backwards compatible and others which have a trunk that breaks you are forced to use branched extensions for EVERY extension, and you forgo the features of the extensions that DO try to be compatible (I DO use both branched and trunk extensions on stable non-development). This is even worse when developing. If you are trying to develop extension compatibility for an old release everything MUST be trunk since you can't really develop in the release branch. As a result any other extension you are using that doesn't have 1.15 compatibility will end up causing unrelated fatal errors hampering your compatibility testing.

Good point.

Daniel Friesen

8:44 p.m.

On 11-03-23 05:37 PM, Platonides wrote:

...

Daniel Friesen wrote:

...
((And before anyone says anything, there's no way I'm putting private keys on servers operated by a 3rd party, or doing development of things on my local machine -- reconfiguring apache and trying to get packaged apache and mysql to NOT start up on my laptop except when wanted is a pain))

On which OS?

Ubuntu. It likes to restore rc.d entries when you upgrade. The /etc/defaults files don't have on/off toggles. And iirc mysql is now a upstart. I used to do development locally but I got tired of playing with /etc/hosts and editing apache config files when I switched projects. In any case, I like using remote servers. Less local config, and I have an actual working link when I need to show something to someone else. .gvfs makes working with remote files as seamless as working with local files. It also lets me work nicely on prototypes of live sites in the proper environment.

...

...
In svn this is a workflow like so: server$ # Edit some code server$ svn up # make sure things are up to date desktop$ svn up # make sure things are up to date desktop$ svn diff | pager # look at the changes that will be pulled desktop$ ssh server "cd /path/to/code; svn diff" | patch -p0 # abuse ssh, svn diff, and patch to pipe changes to the local copy desktop$ scp server:/path/to/code/newfile . # if I added a new file, I need to randomly copy it desktop$ svn diff | pager # take a look at the changes to double check everything is in place desktop$ svn revert [...] # I often work with a dirty working copy with multiple projects mixed together, so I have to revert some unrelated changes or manually list files in svn commit desktop$ # sometimes those changes have two different projects in one file and I have to commit only part of it desktop$ svn diff somefile.php> tmp.patch # I found the easiest fix is putting the changes into a patch desktop$ svn revert somefile.php # erasing all local changes in the file desktop$ nano tmp.patch # editing out the unrelated code changes from the patchfile desktop$ patch -p0< tmp.patch # and re-applying the changes desktop$ rm tmp.patch desktop$ svn commit [...] # And finally I can commit server$ svn up # update the server, things get /fun/ if you try to pull patches a second time

;) guess what, this is an explanation for half the reason why I commit small changes to svn without testing them.

Seems easier to keep home at the same version as the server, rsync from server to home, and then update before committing.

Perhaps. But I expect something like that would also dirty up my clean working copy with config files and other junk that I would need to make a long explicit list of. Being able to commit server side is still a git advantage, and so is being able to make easy partial commits without playing with things like patch. I can also turn some of my dirty code into commits on dangling branches that get re-integrated when I pick the project back up.

...

...
Here's another good case for extension-as-repository.

(...)

...
Extensions are as much used as-is in both trunk and stable. In other words, we use and develop the same version of an extension whether the phase3 around it is trunk or something like REL1_16. However we do not always develop every extension at once. If every extension is in the same repo we are forced to checkout every extension (...) and all of those extensions MUST be of the same branch. This means that if you have a REL1_15 phase3, and you want to use some extensions that try to be backwards compatible and others which have a trunk that breaks you are forced to use branched extensions for EVERY extension, and you forgo the features of the extensions that DO try to be compatible (I DO use both branched and trunk extensions on stable non-development). This is even worse when developing. If you are trying to develop extension compatibility for an old release everything MUST be trunk since you can't really develop in the release branch. As a result any other extension you are using that doesn't have 1.15 compatibility will end up causing unrelated fatal errors hampering your compatibility testing.

Good point.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Aryeh Gregor

24 Mar 24 Mar

8:12 p.m.

On Tue, Mar 22, 2011 at 10:46 PM, Tim Starling tstarling@wikimedia.org wrote:

...

The tone is quite different to one of the first things I read about Mercurial:

"Oops! Mercurial cut off your arm!

"Don't randomly try stuff to see if it'll magically fix it. Remember what you stand to lose, and set down the chainsaw while you still have one good arm."

https://developer.mozilla.org/en/Mercurial_basics

My experience with Mercurial is that if you type the wrong commands, it likes to destroy data. For instance, when doing an hg up with conflicts once, it opened up some kind of three-way diff in vim that I had no idea how to use, and so I exited. This resulted in my working copy (or parts of it) being lost, since apparently it defaulted to assuming that I was okay with whatever default merging it had done, so it threw out the rest. I also once lost commits under similar circumstances when doing hg rebase. I'm pretty sure you can configure it to be safer, but it's one of the major reasons I dislike Mercurial. (I was able to recover my lost data from filesystem backups.)

git, on the other hand, never destroys committed data. Barring bugs (which I don't recall ever running into), the only command that destroys data is git gc, and that normally only destroys things that have been disconnected for a number of days. If you do a rebase, for instance, the old commits are no longer accessible from normal commands like "git log", but they'll stick around for some period of time, so you can recover them if needed (although the process is a bit arcane if you don't know the commit id's). There are also no git commands I've run into that will do anything nasty to your working copy without asking you, except obvious ones like git reset --hard. In the event of update conflicts, for instance, git adds conflict markers just like Subversion.

...

The main argument is that merging is easy so you can branch without the slightest worry. I think this is an exaggeration. Interfaces change, and when they change, developers change all the references to those interfaces in the code which they can see in their working copy. The greater the time difference in the branch points, the more likely it is that your new code will stop working. As the branch point gap grows, merging becomes more a task of understanding the interface changes and rewriting the code, than just repeating the edits and copying in the new code.

I'm not talking about the interfaces between core and extensions, which are reasonably stable. I'm mainly talking mainly about the interfaces which operate within and between core modules. These change all the time. The problem of changing interfaces is most severe when developers are working on different features within the same region of core code.

Doing regular reintegration merges from trunk to development branches doesn't help, it just means that you get the interface changes one at a time, instead of in batches.

Having a short path to trunk means that the maximum amount of code is visible to the developers who are doing the interface changes, so it avoids the duplication of effort that occurs when branch maintainers have to understand and account for every interface change that comes through.

In practice, this is generally not true. Realistically, most patches change a relatively small amount of code and don't cause merge conflicts even if you keep them out of trunk for quite a long time. For instance, I maintain dozens of patches to the proprietary forum software vBulletin for the website I run. I store them all in git, and to upgrade I do a git rebase. Even on a major version upgrade, I only have to update a few of the patches, and the updates are small and can be done mindlessly. It's really very little effort. Even a commit that touches a huge amount of code (like my conversion of named entity references to numeric) will only conflict with a small percentage of patches.

Of course, you have to be more careful with changing interfaces around when people use branches a lot. But in practice, you spend very little of your time resolving merge conflicts, relative to doing actual development work. It's not a significant disadvantage in practice. Experienced Subversion users just expect it to be, since merging in Subversion is horrible and they assume that's how it has to be. (Disclaimer: merges in Subversion are evidently so horrible that I never actually learned how to do them, so I can't give a good breakdown of why exactly DVCS merging is so much better. I can just say that I've never found it to be a problem at all while using a DVCS, but everyone complains about it with Subversion.)

I mean, the DVCS model was popularized by the Linux kernel. It's hard to think of individual codebases that large, or with that much developer activity. In recent years it's over 9,000 commits per release changing several hundred thousand lines of code, which works out to several thousand LOC changed a day. But merging is not a big problem for them -- they spend their time doing development, not wrestling with version control.

...

If we split up the extensions directory, each extension having its own repository, then this will discourage developers from updating the extensions in bulk. This affects both interface changes and general code maintenance. I'm sure translatewiki.net can set up a script to do the necessary 400 commits per day, but I'm not sure if every developer who wants to fix unused variables or change a core/extension interface will want to do the same.

I've thought about this a bit. We want bulk code changes to extensions to be easy, but it would also be nice if it were easier to host extensions "officially" to get translations, distribution, and help from established developers. We also don't want anyone to have to check out all extensions just to get at trunk. Localization, on the other hand, is entirely separate from development, and has very different needs -- it doesn't need code review, and someone looking at the revision history for the whole repository doesn't want to see localization updates. (Especially in extensions, where often you have to scroll through pages of l10n updates to get to the code changes.)

Unfortunately, git's submodule feature is pretty crippled. It basically works like SVN externals, as I understand it: the larger repository just has markers saying where the submodules are, but their actual history is entirely separate. We could probably write a script to commit changes to all extensions at once, but it's certainly a less ideal solution.

If we moved to git, I'd tentatively say something like

* Separate out the version control of localization entirely. Translations are already coordinated centrally on translatewiki.net, where the wiki itself maintains all the actual history and permissions, so the SVN checkin right now is really a needless formality that keeps translations less up-to-date and spams revision logs. Keep the English messages with the code in git, and have the other messages available for checkout in a different format via our own script. This checkout should always grab the latest translatewiki.net messages, without the need for periodic commits. (I assume translatewiki.net already does automatic syntax checks and so on.) Of course, the tarballs would package all languages. * Keep the core code in one repository, each extension in a separate repository, and have an additional repository with all of them as submodules. Or maybe have extensions all be submodules of core (you can check out only a subset of submodules if you want). * Developers who want to make mass changes to extensions are probably already doing them by script (at least I always do), so something like "for EXTENSION in extensions/*; do cd $EXTENSION; git commit -a -m 'Boilerplate message'; cd ..; done" shouldn't be an exceptional burden. If it comes up often enough, we can write a script to help out. * We should take the opportunity to liberalize our policies for extension hosting. Anyone should be able to add an extension, and get commit access only to that extension. MediaWiki developers would get commit access to all hosted extensions, and hooking into our localization system should be as simple as making sure you have a properly-formatted ExtensionName.i18n.php file. If any human involvement is needed, it should only be basic sanity checks. * Code review should migrate to an off-the-shelf tool like Gerrit. I don't think it's a good idea at all for us to reinvent the code-review wheel. To date we've done it poorly.

This is all assuming that we retain our current basic development model, namely commit-then-review with a centrally-controlled group of people with commit access. One step at a time.

On Tue, Mar 22, 2011 at 11:16 PM, Tim Starling tstarling@wikimedia.org wrote:

...

I think our focus at the moment should be on deployment of extensions and core features from the 1.17 branch to Wikimedia. We have heard on several occasions that it is the delay between code commit and deployment, and the difficulty in getting things deployed, which is disheartening for developers who come to us from the Wikimedia community. I'm not so concerned about the backlog of trunk reviews. We cleared it before, so we can clear it again.

I don't think moving to git will make code review very much easier in the short term. It would probably disrupt code review considerably, in fact, because people would have to get used to the new system. So I definitely think code review needs to be worked out before we overhaul anything. And that doesn't mean clearing out backlogs, it means not letting them accumulate in the first place -- like scaps once a month at the very minimum, and preferably at least once a week.

On Wed, Mar 23, 2011 at 2:51 PM, Diederik van Liere dvanliere@gmail.com wrote:

...

The Python Community recently switched to a DVCS and they have documented their choice. It compares Git, Mercurial and Bzr and shows the pluses and minuses of each. In the end, they went for Mercurial.

Choosing a distributed VCS for the Python project: http://www.python.org/dev/peps/pep-0374/

They gave three reasons:

1) git's Windows support isn't as good as Mercurial's. I don't know how much merit that has these days, so it bears investigation. I have the impression that the majority of MediaWiki developers use non-Windows platforms for development, so as long as it works well enough, I don't know if this should be a big deal.

2) Python developers preferred Mercurial when surveyed. Informally, I'm pretty certain that most MediaWiki developers with a preference prefer git.

3) Mercurial is written in Python, and Python developers want to use stuff written in Python. Not really relevant to us, even those of us who like Python a lot. :) (FWIW, despite being a big Python fan, I'm a bit perturbed that Mercurial often prints out a Python stack trace when it dies instead of a proper error message . . .)

GNOME also surveyed available options, and they decided to go with git: http://blogs.gnome.org/newren/2009/01/03/gnome-dvcs-survey-results/ Although of course, (1) would be a bit of a nonissue for them.

On Wed, Mar 23, 2011 at 3:41 PM, Rob Lanphier robla@wikimedia.org wrote:

...

We will probably need to adopt some guidelines about the use of rebase assuming we move to Git.

I don't see why. Rebase can never be used on publicly-visible repositories -- anyone who tries to pull from the repo both before and after the rebase will get errors, since the current upstream HEAD is not a descendant of the old upstream HEAD. So rebasing is only relevant to what developers do in their own private branches, before they push them to the central public repository.

What we'd need is policies on *merging*. Do we encourage people to submit clean merges with an empty merge commit so the development history is preserved, or encourage rebasing so that the development history is linear and easier to analyze (e.g., bisect)? Whatever policies we adopt, people can always rebase in their private repos as much as they want, if they like. I guess we could discourage it, but I don't see why, as long as it doesn't cause bugs.

...

We don't need to switch to the one extension per repo model right away, though. We could throw all of the extensions into a single repository at first, and then split it later if we run into this or other similar problems.

No we can't. Any clone of the repository will have all history. If you want to split out extensions at a later date, you're not going to save much space, since they'll still be cloned with all the rest of the history. To really get rid of them, you'd have to create a whole new repository, forcing everyone to do a fresh clone and seriously hampering git's ability to merge any uncommitted work from before you broke up the repo. If we want to split off some things into their own repos, the time to do that is when we switch to git, not afterward.

Daniel Friesen

9:02 p.m.

On 11-03-24 06:12 PM, Aryeh Gregor wrote:

...

On Tue, Mar 22, 2011 at 10:46 PM, Tim Starlingtstarling@wikimedia.org wrote:

...
If we split up the extensions directory, each extension having its own repository, then this will discourage developers from updating the extensions in bulk. This affects both interface changes and general code maintenance. I'm sure translatewiki.net can set up a script to do the necessary 400 commits per day, but I'm not sure if every developer who wants to fix unused variables or change a core/extension interface will want to do the same.

I've thought about this a bit. We want bulk code changes to extensions to be easy, but it would also be nice if it were easier to host extensions "officially" to get translations, distribution, and help from established developers. We also don't want anyone to have to check out all extensions just to get at trunk. Localization, on the other hand, is entirely separate from development, and has very different needs -- it doesn't need code review, and someone looking at the revision history for the whole repository doesn't want to see localization updates. (Especially in extensions, where often you have to scroll through pages of l10n updates to get to the code changes.)

Unfortunately, git's submodule feature is pretty crippled. It basically works like SVN externals, as I understand it: the larger repository just has markers saying where the submodules are, but their actual history is entirely separate. We could probably write a script to commit changes to all extensions at once, but it's certainly a less ideal solution.

git's submodule feature is something like svn-externals but has a big fundamental difference. svn externals tracks only a repo. so you update you get the latest version of that repo. git submodules tracks a repo and a commit id, always. So when you update you always get the same commit id. Changing that commit id requires making a commit to the git repo to update it. You can also checkout an old commit and submodule update will checkout the commmit id of the submodule that was committed at that point in time. But yes, for both of them it's merely references, they do not store the actual history. They're glorified helper scripts essentially, they don't alleviate the task of downloading each repo separately. They just make the vcs do it for you, instead of you running a script in some other language to do it for you.

In my honest opinion, submodules was not designed for what we are trying to shove into it. And given that one of it's key features (tracking a specific commit id to ensure the same version is always checked out) is actually the opposite of what we want, I believe the actual functionality of git submodules in this situation is no better than what we could build ourself with a few simple custom scripts. In fact I believe we could build something better for our purposes without too much effort. And we could check it into a git repo in place of the repo that submodules would be put in. If you dig through the git discussions I believe I listed a number of features we could add that would make it even more useful. Instead of a second repo, we could just put the tool itself inside mw's repo so that by checking out phase3 you get the tools needed to work with extensions.

...

If we moved to git, I'd tentatively say something like

Separate out the version control of localization entirely.

Translations are already coordinated centrally on translatewiki.net, where the wiki itself maintains all the actual history and permissions, so the SVN checkin right now is really a needless formality that keeps translations less up-to-date and spams revision logs. Keep the English messages with the code in git, and have the other messages available for checkout in a different format via our own script. This checkout should always grab the latest translatewiki.net messages, without the need for periodic commits. (I assume translatewiki.net already does automatic syntax checks and so on.) Of course, the tarballs would package all languages.

...

Keep the core code in one repository, each extension in a separate

repository, and have an additional repository with all of them as submodules. Or maybe have extensions all be submodules of core (you can check out only a subset of submodules if you want).

Developers who want to make mass changes to extensions are probably

already doing them by script (at least I always do), so something like "for EXTENSION in extensions/*; do cd $EXTENSION; git commit -a -m 'Boilerplate message'; cd ..; done" shouldn't be an exceptional burden. If it comes up often enough, we can write a script to help out.

...

We should take the opportunity to liberalize our policies for

extension hosting. Anyone should be able to add an extension, and get commit access only to that extension. MediaWiki developers would get commit access to all hosted extensions, and hooking into our localization system should be as simple as making sure you have a properly-formatted ExtensionName.i18n.php file. If any human involvement is needed, it should only be basic sanity checks.

I LOVE this idea too, it's been on my mind for awhile.

Brion mentioned that there is some prior art in git farming. Gitorious' codebase is open source. Wikimedia could host a copy of it for the purposes of hosting git repos for MediaWiki and extensions/ Built in management of pubkeys, projects and project repos (Say, MediaWiki, extensions as projects, and some groups of extensions like SMW could be put in one project), teams (put core devs in a team and give them access to the trunk like MediaWiki core repo; we can also add teams like smw-devs that let us open up groups of extensions to groups of people collaborating on them), team clones (make wmf a team and make the wmf branch a clone of the MediaWiki repo for access control), personal clones (so users without access to core can still make a clone, keep it in a place tied with potential code review, and participate by sending merge requests back to core so devs can pick them up and put them in; is this a form of pre-commit review?), and of course the code for letting someone sign up, not have commit to everything, but create their own project repo for an extension and start committing to it.

Oh, as a little bonus. Theoretically we may be able to make some moderate tweaks to Gitorious and build in a simple api that'll list all extensions, as tagged. You can already get something close by using .xml on the project view (since it's a rails app). Using that data we could easily build a tool that would clone all extensions, and from there let you batch commit/push/checkout,branch/updateremote/etc. And we could easily build it to take account of labeling, meaning you could potentially checkout all extensions in TWN, or all extensions tagged as SMW, or all extensions tagged as 'UsedOnWMF'. Naturally of course it would be trivial to make it checkout the repo for an extension by name.

I'd love git being first class and Wikimedia hosted. I'd probably take monaco-port (which is on GitHub right now) and make the repo on Wikimedia the primary repo.

...

Code review should migrate to an off-the-shelf tool like Gerrit. I

don't think it's a good idea at all for us to reinvent the code-review wheel. To date we've done it poorly.

This is all assuming that we retain our current basic development model, namely commit-then-review with a centrally-controlled group of people with commit access. One step at a time.

A mixed format might be possible too. Where the bulk of developers can commit to one repo, but we have a second repo for post-review code which is considered to be a more stable trunk. And naturally whatever we do we can make it easier for non-devs to submit code by publishing their own public clone.

...

On Wed, Mar 23, 2011 at 2:51 PM, Diederik van Lieredvanliere@gmail.com wrote:

...
The Python Community recently switched to a DVCS and they have documented their choice. It compares Git, Mercurial and Bzr and shows the pluses and minuses of each. In the end, they went for Mercurial.

Choosing a distributed VCS for the Python project: http://www.python.org/dev/peps/pep-0374/

They gave three reasons:

git's Windows support isn't as good as Mercurial's. I don't know

how much merit that has these days, so it bears investigation. I have the impression that the majority of MediaWiki developers use non-Windows platforms for development, so as long as it works well enough, I don't know if this should be a big deal.

For cli there's mysgit. For gui there is TortoiseGit and gitextensions. I hear comments that TortoiseGit lacks some of gits features, namely interaction with the index. However it's supposed to feel fairly similar to TortoiseSVN (which if we have svn Windows users using a GUI, I expect they're probably using, so that might be helpful). However gitextensions looks fairly interesting, I'm not a Windows user anymore so I haven't looked at it in depth: http://sourceforge.net/projects/gitextensions/

That pep was from a year ago, so git's Windows support can only have gotten better.

...

Python developers preferred Mercurial when surveyed. Informally,

I'm pretty certain that most MediaWiki developers with a preference prefer git.

Thanks in part to GitHub, git is definitely as someone else mentioned the 'flavor of the week', though to be fair, in a sense I believe svn was similar to that aspect. I do believe that we are likely to find a lot more MW devs that are comfortable with git than with other dvcs.

...

Mercurial is written in Python, and Python developers want to use

stuff written in Python. Not really relevant to us, even those of us who like Python a lot. :) (FWIW, despite being a big Python fan, I'm a bit perturbed that Mercurial often prints out a Python stack trace when it dies instead of a proper error message . . .)

GNOME also surveyed available options, and they decided to go with git:http://blogs.gnome.org/newren/2009/01/03/gnome-dvcs-survey-results/ Although of course, (1) would be a bit of a nonissue for them.

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Bryan Tong Minh

25 Mar 25 Mar

3:38 a.m.

On Fri, Mar 25, 2011 at 2:12 AM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:

...

I don't think moving to git will make code review very much easier in the short term. It would probably disrupt code review considerably, in fact, because people would have to get used to the new system. So I definitely think code review needs to be worked out before we overhaul anything. And that doesn't mean clearing out backlogs, it means not letting them accumulate in the first place -- like scaps once a month at the very minimum, and preferably at least once a week.

I find this, together with Roan's mail in the other thread on how to actually make CodeReview easier, a very sensible idea. We should first fix our "management" problems (i.e. getting code review done) before we fix our technical problems (making branching/merging suck less).

Bryan

Platonides

6:38 p.m.

Aryeh Gregor wrote:

...

My experience with Mercurial is that if you type the wrong commands, it likes to destroy data. For instance, when doing an hg up with conflicts once, it opened up some kind of three-way diff in vim that I had no idea how to use, and so I exited. This resulted in my working copy (or parts of it) being lost, since apparently it defaulted to assuming that I was okay with whatever default merging it had done, so it threw out the rest. I also once lost commits under similar circumstances when doing hg rebase. I'm pretty sure you can configure it to be safer, but it's one of the major reasons I dislike Mercurial. (I was able to recover my lost data from filesystem backups.)

git, on the other hand, never destroys committed data...

Mercurial doesn't destroy committed data, either. You just ended up in a situation that you didn't know how to exit. Very much like being trapped into vi and, not knowing the keys to exit, killing it (yes, I have done that in the past). A hg merge is not undone by the revert command. You need to do hg update --clean

Another -more generic- explanation on how you could have reset a borked working copy, is that you could have cloned the previous copy, as that would always set you back to the latest commited revision. It seems a failure cloning repos for doing alternate work, but DCVS seem fond of suggesting that you should have clones for separate working (yes, they use hardlinks, but it still seems too much). Something like git stash seems the way to go.

Ævar Arnfjörð Bjarmason

22 Mar 22 Mar

11 a.m.

On Tue, Mar 22, 2011 at 16:33, Max Semenik maxsem.wiki@gmail.com wrote:

...

On 22.03.2011, 18:08 Trevor wrote:

...
Your objections seem to be based on the assumption that you would need to have push access to all repositories, but I think that's the point of DCVS, you can just fork them, and then people can pull your changes in themselves (or using a tool). Pull requests could even be generated when things are out of sync.

...
I think it's quite possible this could make i18n/L10n work easier, not more difficult.

You seem to miss Siebrand's point: curerently, all localisation updates take one commit per day. Splitting stuff to separate repos will result in up to 400 commits per day that will also need to be pushed and reintegrated - an epic waste of time and common sense. Or localisation will simply lie aside in forks and people will miss them when checking out from the "official" source.

I think you're missing the point that there's no reason why 400 commits should be harder than 1 in this case.

When he makes a commit now he ends up stat()-ing 400 files, but he doesn't notice because it's all abstracted away.

Similarly he could make 400 commits by issuing one command, just like he does today.

And what does "pushed and reintegrated" mean? He'd presumably push to the canonical upstream, just like he does now.

(Or he could push somewhere else if people would like that, pulling from that should also be trivial).

K. Peachey

11:06 a.m.

On Wed, Mar 23, 2011 at 2:00 AM, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...

I think you're missing the point that there's no reason why 400 commits should be harder than 1 in this case.

Code review comes to mind there. -Peachey

Niklas Laxström

23 Mar 23 Mar

3:28 a.m.

On 22 March 2011 18:00, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...

I think you're missing the point that there's no reason why 400 commits should be harder than 1 in this case.

Thanks to Ævar I realised that we're all missing the point and assuming things which are never spoken out. These silent assumptions must be spoken out to get everyone on the same line.

For twn it's not the commits which is hard, it's all the other things: 1) picking up new projects when they are committed (Mediawiki-svn list) - add few lines to our config file - initial import of messages 2) following changes (Mediawiki-svn list) - import new/changed - run fuzzy if needed - bully developers if needed (Code review, IRC)

Now, when we move to git how do we keep the workflow as simple as it is now? Where will the repos be? Will they all share user accounts? Will everyone be able to commit everywhere? How is the standard repo location & file layout enforced? What will we offer to people who check out all extensions from svn and want to update them all in one command? What about only a subset of extensions (extensions used in twn)? What about only the checkout of i18n files (extensions translated in twn)? How do we know when repos are created or deleted? How do we know which repos are the official upstreams and not just clones of extensions developed elsewhere?

What will replace Mediawiki-commits list and code review? It's really hard to say what could be the real issues when there is no proposal how it would actually work. For example I don't think what avar said to me on IRC (below) has been stated here before, while I find it essential to judge the whole idea.

avar> We're not going to move from a "central" SVN server to Git repositories scattered all over the place, just Git repositories hosted on some WM server. avar> So all the ssh keys etc. would already be set up, and getting a list of repos would be no harder than getting a list of extension dirs today.

What other unspoken assumptions are there?

-Niklas

-- Niklas Laxström

Daniel Friesen

6:30 a.m.

On 11-03-23 01:28 AM, Niklas Laxström wrote:

...

On 22 March 2011 18:00, Ævar Arnfjörð Bjarmasonavarab@gmail.com wrote:

...
I think you're missing the point that there's no reason why 400 commits should be harder than 1 in this case.

Thanks to Ævar I realised that we're all missing the point and assuming things which are never spoken out. These silent assumptions must be spoken out to get everyone on the same line.

For twn it's not the commits which is hard, it's all the other things:

picking up new projects when they are committed (Mediawiki-svn list)

add few lines to our config file

initial import of messages

following changes (Mediawiki-svn list)

import new/changed

run fuzzy if needed

bully developers if needed (Code review, IRC)

Now, when we move to git how do we keep the workflow as simple as it is now? Where will the repos be? Will they all share user accounts? Will everyone be able to commit everywhere? How is the standard repo location& file layout enforced? What will we offer to people who check out all extensions from svn and want to update them all in one command? What about only a subset of extensions (extensions used in twn)? What about only the checkout of i18n files (extensions translated in twn)? How do we know when repos are created or deleted? How do we know which repos are the official upstreams and not just clones of extensions developed elsewhere?

What will replace Mediawiki-commits list and code review? It's really hard to say what could be the real issues when there is no proposal how it would actually work. For example I don't think what avar said to me on IRC (below) has been stated here before, while I find it essential to judge the whole idea.

avar> We're not going to move from a "central" SVN server to Git repositories scattered all over the place, just Git repositories hosted on some WM server. avar> So all the ssh keys etc. would already be set up, and getting a list of repos would be no harder than getting a list of extension dirs today.

What other unspoken assumptions are there?

-Niklas

- Brion mentioned there is prior art in hosting large numbers of git repos. Gitorious' codebase is open-source and can be re-used. Wikimedia could potentially host it's own gitorious for MediaWiki git repos. -- This should probably take care of how to handle giving push access to various people -- Theoretically we would also be able to update our own pubkeys if we re-used Gitorious -- Theoretically this should also make it easier to give anyone who asks a user account with commit access only to their own extension. ie: It should become much easier to get extensions of authors without commit who are currently building extensions using tarballs, external code hosting, or pages on MW.org to commit to a Wikimedia hosted repo which we can also commit to and review. - I'd like to use a standard set of scripts for dealing with repos en-masse. This would allow us to do mass commits as well for code maintenance, rather than only being able to checkout en-masse. We could also make this take extensions themselves into account in ways that'll let us have it only download extension repos of a specific type (extensions in TWN, SMW extensions, extensions listed in a text file list of extensions you want to work with) which would help with the TWN issues. - Sadly checking out i18n-only won't be possible if you're planning to commit. Though, is that really so bad? We might need to do some space comparisons, don't forget that .git and .svn dirs take up different sizes, it's not clear what the difference in space use will be. - Git has update, post-receive, and post-update hooks run on a remote repo that gets pushed to. So it should be trivial to send out updates of new commits to mailing lists. Gitorious might also have something to help. -- And if we're using Gitorious and for some unforeseen reason that probably won't happen have to use some ugly hack, at the very least it should be possible to create a dummy user, give him an e-mail of the mailing list, and abuse Gitorious' favorites system to have e-mails sent through that.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Platonides

1:51 p.m.

Daniel Friesen wrote:

...

Brion mentioned there is prior art in hosting large numbers of git

repos. Gitorious' codebase is open-source and can be re-used. Wikimedia could potentially host it's own gitorious for MediaWiki git repos.

I realised today that we are trying to adapt our layout to the DCVS. It should be the tool the one which should adapt to our own working. It is a good time to split in repos if we want to, but we shouldn't split folders just because a version system does not support it. We should file it as a bug upstream and wait for any of the several alternatives to fix it.

...

-- Theoretically this should also make it easier to give anyone who asks a user account with commit access only to their own extension. ie: It should become much easier to get extensions of authors without commit who are currently building extensions using tarballs, external code hosting, or pages on MW.org to commit to a Wikimedia hosted repo which we can also commit to and review.

Have you seen Ryan Lane work? The users could add their ssh key and automatically get commit access to its own extension with that.

...

I'd like to use a standard set of scripts for dealing with repos

en-masse. This would allow us to do mass commits as well for code maintenance, rather than only being able to checkout en-masse. We could also make this take extensions themselves into account in ways that'll let us have it only download extension repos of a specific type (extensions in TWN, SMW extensions, extensions listed in a text file list of extensions you want to work with) which would help with the TWN issues.

Google has some scripts like that for dealing with multiple repositories (and multiple vcs).

Gerard Meijssen

22 Mar 22 Mar

11:18 a.m.

Hoi, When you look at the situation with the Toolserver where everybody has its own toy source area you have a situation where internationalisation and the upgrading of functionality to a production level is not happening. If GIT is so great, then solve an existing pain which is the inability to collaborate on toolserver tools.

GIT is cool, it is the flavour of the month. It is an improvement when it proves itself in what is in my opinion a manifest dysfunctional source management environment. When the Toolserver sources are all in a GIT repository and its localisation becomes manageable, you have the proof of the pudding demonstrating problem solving ability. When internationalisation and localisation are part of the solution you are convincing that we can move to GIT. Thanks, GerardM

On 22 March 2011 16:08, Trevor Parscal tparscal@wikimedia.org wrote:

...

Your objections seem to be based on the assumption that you would need to have push access to all repositories, but I think that's the point of DCVS, you can just fork them, and then people can pull your changes in themselves (or using a tool). Pull requests could even be generated when things are out of sync.

I think it's quite possible this could make i18n/L10n work easier, not more difficult.

Trevor

On Mar 22, 2011, at 7:25 AM, Siebrand Mazeland wrote:

...
From what I understand, common thought is that phase3 and all individual extensions, as well as directories in trunk/ aside from extensions and phase3 will be their own repos. Possibly there will be meta collections that allow cloning things in one go, but that does not allow committing

to

...
multiple repos in one go without requiring scripting. This is a use case that is used *a lot* by L10n committers and others. I think this is bad.

I am raising my objections against GIT as a replacement VCS for MediaWiki's svn.wikimedia.org and the way people are talking about implementing it again from an i18n perspective, and also from a community/product stability perspective.

I raised this in the thread "Migrating to GIT (extensions)"[1,2] mid February. My concerns have not been taken away. i18n/L10n maintenance

will

...
be a lot harder and more distributed. In my opinion the MediaWiki development community is not harmed by the continued use of Subversion.

In

...
fact, the global maintenance - I define this as fixing backward incompatibilities introduced in core in the 400+ extensions in

Subversion,

...
as well as updating extensions to current coding standard - that many active developers are involved in now, will likely decrease IMO, because having to commit to multiple repos will make it more cumbersome to

perform

...
these activities. Things that require extra work by a developer without any obvious benefits out are just discontinued in my experience. As a consequence, the number of unmaintained and crappy extensions will increase, which is bad for the product image and in the end for the community - not caring about that single extension repo is too easy, and many [devs] not caring about hundreds [of extensions] is even worse.

Please convince me that things will not be as hard as I describe above,

or

...
will most definitely not turn out as I fear. I am open to improvements, but moving to GIT without addressing these concerns for the sake of

having

...
this great DVCS is not justified IMO.

Siebrand

M: +31 6 50 69 1239 Skype: siebrand

[1]

http://lists.wikimedia.org/pipermail/wikitech-l/2011-February/thread.html#5

...
1812

[2]

http://lists.wikimedia.org/pipermail/wikitech-l/2011-February/051817.html

...
On 22-03-11 10:15 Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...
On Tue, Mar 22, 2011 at 08:27, Yuvi Panda yuvipanda@gmail.com wrote:

...
On Sun, Mar 20, 2011 at 9:25 PM, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:

...
But actually the reason I did this mirror was as a proof of concept for a (still incomplete) conversion to Git.

Is there still interest in that? I don't have a lot of time for it, but I could help with that if people want to go that way.

If lack of people dedicated to this is why a migration isn't being considered (I guess not), I volunteer myself.

Lack of time and people is indeed a factor. The import we have now isn't a proper Git conversion.

I still have some vague notes here detailing approximately what we need, some of these are out of date. The "Split up and convert" section is somewhat accurate though:

http://www.mediawiki.org/wiki/Git_conversion

No SVN to Git tool does exactly what we need due to our messy history. I came to the conclusion that it was probably easiest to filter the SVN dump (to e.g. fix up branch paths) before feeding the history to one of these tools.

Of course even if we come up with a perfect conversion it's pretty much useless if Wikimedia doesn't want to use it for its main repositories. So getting a yes/no on whether this is wanted by WM before you proceed with something would prevent you/others from wasting their time on this.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Ryan Lane

11:30 a.m.

...

When you look at the situation with the Toolserver where everybody has its own toy source area you have a situation where internationalisation and the upgrading of functionality to a production level is not happening. If GIT is so great, then solve an existing pain which is the inability to collaborate on toolserver tools.

GIT is cool, it is the flavour of the month. It is an improvement when it proves itself in what is in my opinion a manifest dysfunctional source management environment. When the Toolserver sources are all in a GIT repository and its localisation becomes manageable, you have the proof of the pudding demonstrating problem solving ability. When internationalisation and localisation are part of the solution you are convincing that we can move to GIT.

Toolserver has a social problem, not a technological one. They have the ability to use SVN, or a source control system of their choosing, yet they don't. This thread is discussing a perceived problem with a tool we are already successfully using. Let's focus on one issue at a time.

- Ryan Lane

Gerard Meijssen

12:36 p.m.

Hoi, We are indeed using SVN successfully.

As to Toolserver, this environment and its functionality is deeply flawed. As the tools are open source, there is no reason why relevant tools cannot be brought into GIT and upgraded to a level where they are of production quality. Either GIT is able to cope or its distributed character adds no real value.

The notion that it has to be MediaWiki core and or its extensions first is absurd when you consider that it is what we use to run one of the biggest websites of the world. We rely on the continued support for our production process. The daily process provided by LocalisationUpdate is such a production process. When the continuity of production processes is not a prime priority, something is fundamentally wrong. Thanks, GerardM

On 22 March 2011 17:30, Ryan Lane rlane32@gmail.com wrote:

...

...
When you look at the situation with the Toolserver where everybody has

its

...
own toy source area you have a situation where internationalisation and

the

...
upgrading of functionality to a production level is not happening. If GIT

is

...
so great, then solve an existing pain which is the inability to

collaborate

...
on toolserver tools.

GIT is cool, it is the flavour of the month. It is an improvement when it proves itself in what is in my opinion a manifest dysfunctional source management environment. When the Toolserver sources are all in a GIT repository and its localisation becomes manageable, you have the proof of the pudding demonstrating problem solving ability. When

internationalisation

...
and localisation are part of the solution you are convincing that we can move to GIT.

Toolserver has a social problem, not a technological one. They have the ability to use SVN, or a source control system of their choosing, yet they don't. This thread is discussing a perceived problem with a tool we are already successfully using. Let's focus on one issue at a time.

Ryan Lane

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Roan Kattouw

2:15 p.m.

2011/3/22 Gerard Meijssen gerard.meijssen@gmail.com:

...

The notion that it has to be MediaWiki core and or its extensions first is absurd when you consider that it is what we use to run one of the biggest websites of the world. We rely on the continued support for our production process. The daily process provided by LocalisationUpdate is such a production process. When the continuity of production processes is not a prime priority, something is fundamentally wrong.

There seems to be this misconception with the TranslateWiki people that using Git automatically means that integrating localization updates will somehow be more difficult. This is not the case. In fact, it won't even be appreciably different from the way we currently handle localization updates in SVN.

While Git *supports* the workflow of pushing freshly written code to some branch and merging it into trunk after it has been reviewed, it does not *require* it. It's trivial to set things up so that TWN can commit and push their localization updates directly into trunk, without a need for review or merging or whatever. Having LocalisationUpdate continue to pull those updates from trunk and apply them to the live site is equally trivial.

The only thing that may be different, depending on what our workflow ends up being, is that messages that have been added in some branch that hasn't been merged to trunk yet will not automatically be picked up by TWN for translation. This is technically already the case, but branches aren't very common at this time and will likely become more common with Git. If we end up with a first-review-then-merge-to-trunk workflow, messages wouldn't be available for translation on TWN until after the commits that introduced them have been reviewed and merged to trunk, so TWN will be behind the curve a little bit. But I'm not convinced that's necessarily bad: it'll hopefully prevent poorly organized or poorly translatable messages from making their way into trunk, thereby making sure translators never even see those.

Roan Kattouw (Catrope)

Brion Vibber

2:26 p.m.

I've started collecting some notes on issues that need to be considered for a potential git migration:

http://www.mediawiki.org/wiki/Git_migration_issues

I'm paying particular attention to the localization workflow thing. Note that TranslateWiki's been working on StatusNet's git repository for some time; git itself isn't a particular problem. But changing the layout of repositories could indeed change how some things need to be done, and we need to make sure we know how to solve whichever problems come up.

I think it's pretty likely that we can work those problems out -- nothing's set in stone yet, and there's plenty of opportunity to experiment with some sample layouts first!

-- brion

Ashar Voultoiz

8:27 p.m.

On 22/03/11 20:26, Brion Vibber wrote:

...

I've started collecting some notes on issues that need to be considered for a potential git migration:

http://www.mediawiki.org/wiki/Git_migration_issues

I'm paying particular attention to the localization workflow thing. Note that TranslateWiki's been working on StatusNet's git repository for some time; git itself isn't a particular problem. But changing the layout of repositories could indeed change how some things need to be done, and we need to make sure we know how to solve whichever problems come up.

I think it's pretty likely that we can work those problems out -- nothing's set in stone yet, and there's plenty of opportunity to experiment with some sample layouts first!

Tools are almost never an issue. We could as well use Bugzilla as a patch/review queue and have ONE release manager to apply them in a CVS tree. I have seen developers using MS Word to track code, they even managed to merge their code this way.

The issue we have is a proper workflow and, I believe, the lack of a roadmap.

Do we really need all the language messages in core? They are probably "useless" for day to day code hacking. Most developers probably use the English messages only anyway. The only things we have to do is to fill up the Messages.inc metadata file, and the English message. Nowadays, I do not even bother to translate my own messages to my native language (which is French).

So, to me, messages serves no need for the developers. They are only useful for live site and MediaWiki releases.

For the live site, we could just pull messages from whatever system is used (gettext .po, rosetta, git, ftp site). This can be done every week, day or hour.

Then comes the need to release a MediaWiki release. This mean you have to sync both projects (code + l10n) and this is not possible while you keep having new messages added in MediaWiki or messages parameters being changed.

That is what I meant by a workflow. We have actors, tasks and responsibilities and eventually end up with a product.

I have said it already: we need a rough roadmap to release a new version. One of the first steps would be: - obviously 'no new features' step . Followed by... - 'no new UI messages' - 'no parameter changes' - 'no messages changes'

This will give time to translators to finish up the messages translations for the release.

The lack of roadmap for 1.18 is probably the same cause that makes our code review queue filled again. It looks a bit like a wiki sandbox, some code (myself included) should never have been sent to trunk.

To conclude, the good point, is that we have 1.17 feature frozen, deployed on live site and it might even get released before this summer. Given the situation back in September 2010: this is a huge accomplishment :-)

-- Ashar "hashar" Voultoiz

Gerard Meijssen

23 Mar 23 Mar

12:17 a.m.

Hoi,

...

From the point of view of the internationalisation and localisation there

are two states.

- the English message is stable and fits the requirements of i18n; it is a meaningful translatable message with constructs like gender and plural as needed - The English message is stable and does not fit the requirements of i18n.

When the message does not fit the requirements, it is from an internationalisation point of view obviously a bug. This is typically fixed by the developers at translatewiki.net and has an effect on all existing localisations; they are FUZZYd. From the point of view of the development of code such bug fixes are transparent.

The LocalisationUpdate process, functionality that was created to bring localisations to an installed environment in a timely manner is based on the English message being exactly the same. In translatewiki.net we know about messages that exist only in previous releases and they are still available for localisation. As a result releases are very much external to the localisation effort. Localisation work is motivated by making a difference on a live environment.

In order to prevent issues with localisation during the shake down period of development, it is best to release code early and often. This will make new or changed messages visible to the developers at translatewiki.net and this enables them to adjust messages for i18n purposes when needed.

Bug 28191 was added today and it seeks to decrease the time from localisation to implementation. At this time the implementation of newly localised messages is done with a cron job, I understand from your description that it might technically be possible to push localisations out whenever. Practically this will happen only once the quality assurance processes at translatewiki.net have been completed.

I hope this helps. Thanks, GerardM

On 23 March 2011 02:27, Ashar Voultoiz hashar+wmf@free.fr wrote:

...

On 22/03/11 20:26, Brion Vibber wrote:

...
I've started collecting some notes on issues that need to be considered

for

...
a potential git migration:

http://www.mediawiki.org/wiki/Git_migration_issues

I'm paying particular attention to the localization workflow thing. Note that TranslateWiki's been working on StatusNet's git repository for some time; git itself isn't a particular problem. But changing the layout of repositories could indeed change how some things need to be done, and we need to make sure we know how to solve whichever problems come up.

I think it's pretty likely that we can work those problems out --

nothing's

...
set in stone yet, and there's plenty of opportunity to experiment with

some

...
sample layouts first!

Tools are almost never an issue. We could as well use Bugzilla as a patch/review queue and have ONE release manager to apply them in a CVS tree. I have seen developers using MS Word to track code, they even managed to merge their code this way.

The issue we have is a proper workflow and, I believe, the lack of a roadmap.

Do we really need all the language messages in core? They are probably "useless" for day to day code hacking. Most developers probably use the English messages only anyway. The only things we have to do is to fill up the Messages.inc metadata file, and the English message. Nowadays, I do not even bother to translate my own messages to my native language (which is French).

So, to me, messages serves no need for the developers. They are only useful for live site and MediaWiki releases.

For the live site, we could just pull messages from whatever system is used (gettext .po, rosetta, git, ftp site). This can be done every week, day or hour.

Then comes the need to release a MediaWiki release. This mean you have to sync both projects (code + l10n) and this is not possible while you keep having new messages added in MediaWiki or messages parameters being changed.

That is what I meant by a workflow. We have actors, tasks and responsibilities and eventually end up with a product.

I have said it already: we need a rough roadmap to release a new version. One of the first steps would be:

obviously 'no new features' step . Followed by...

'no new UI messages'

'no parameter changes'

'no messages changes'

This will give time to translators to finish up the messages translations for the release.

The lack of roadmap for 1.18 is probably the same cause that makes our code review queue filled again. It looks a bit like a wiki sandbox, some code (myself included) should never have been sent to trunk.

To conclude, the good point, is that we have 1.17 feature frozen, deployed on live site and it might even get released before this summer. Given the situation back in September 2010: this is a huge accomplishment :-)

-- Ashar "hashar" Voultoiz

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Platonides

22 Mar 22 Mar

7:23 p.m.

Roan Kattouw wrote:

...

The only thing that may be different, depending on what our workflow ends up being, is that messages that have been added in some branch that hasn't been merged to trunk yet will not automatically be picked up by TWN for translation. This is technically already the case, but branches aren't very common at this time and will likely become more common with Git. If we end up with a first-review-then-merge-to-trunk workflow...

What workflow would we be using? It's fine that git supports many ways to be used but the important one is how would mw be used, and probably each one is thinking that it would be used in a slightly different way.

I'm not a proficient git user. I see benefits from a DVCS. Automatically merging/umerging a revision to 1.17/wmf with a checkbox at CR would be cool, for instance. Stacking several commits before one push also looks nice. How would that be presented to the reviewer? As a single diff, as several ones? Note that this could already be done with svn (if they are easy diffs).

Even with some kind of review-before-merge approach we would still need a trunk from which to work usually, and the stable reviewed branch would advance behind it. (Do git patch queue work like that?) Backing out changesets would be easier, and they could be continued in a forked branch, but I'm not sure that looks nice.

Robla says that it will mean less reviewing problems. It may decrease them, but there will still be "errors noticed just after pushing" (ie. separate changes that should be just one logical change). The benefits for working with branches would just be those derived from better merging. We can already review a branch (although nobody likes to), or perform a review by files. And still, having a DVCS won't avoid silly merging errors.

A DCVS also complicates some things. There is an entry barrier to pass. Revision number such as r12345 is really easy (still, I'd like having a summary attached to revision names), and we use them everywhere* . Talking about revision 0fda45694 is ugly. Mercurial also has revision numbers, but they follow the revisions at the working copy, not those at the master repo.

* Which is good. I am proud for example on how we reference the relevant commits on bugzilla, so the path from bug to fix is trivial. On some projects, that is hard, and you have to end up comparing the bug closing date with near commits.

Ævar Arnfjörð Bjarmason

10:24 a.m.

On Tue, Mar 22, 2011 at 15:25, Siebrand Mazeland s.mazeland@xs4all.nl wrote:

...

Please convince me that things will not be as hard as I describe above, or will most definitely not turn out as I fear. I am open to improvements, but moving to GIT without addressing these concerns for the sake of having this great DVCS is not justified IMO.

I think the last time this came up I asked you why the difficulty of what you have to do is a function of the number of repositories you have to push to.

That shouldn't be the case, that's trivially scriptable. You'd still be reviewing the same *content*. You'd just push to more locations.

So it's easy to get this right for you in Git, and you're the only person AFAIC with this use case.

5029

Age (days ago)

5038

Last active (days ago)

wikitech-l@lists.wikimedia.org

78 comments

27 participants

tags (0)

participants (27)

Aryeh Gregor
Ashar Voultoiz
Brion Vibber
Bryan Tong Minh
Chad
Daniel Friesen
Gerard Meijssen
Happy-melon
Jack Phoenix
K. Peachey
Marcin Cieslak
Max Semenik
mhershberger＠wikimedia.org
MZMcBride
Neil Kandalgaonkar
Niklas Laxström
Platonides
Roan Kattouw
Rob Lanphier
Rob Lanphier
Robert Leverington
Ryan Lane
Siebrand Mazeland
Tim Starling
Trevor Parscal
Yuvi Panda
Ævar Arnfjörð Bjarmason