On 11-03-24 06:12 PM, Aryeh Gregor wrote:
On Tue, Mar 22, 2011 at 10:46 PM, Tim Starlingtstarling@wikimedia.org wrote:
If we split up the extensions directory, each extension having its own repository, then this will discourage developers from updating the extensions in bulk. This affects both interface changes and general code maintenance. I'm sure translatewiki.net can set up a script to do the necessary 400 commits per day, but I'm not sure if every developer who wants to fix unused variables or change a core/extension interface will want to do the same.
I've thought about this a bit. We want bulk code changes to extensions to be easy, but it would also be nice if it were easier to host extensions "officially" to get translations, distribution, and help from established developers. We also don't want anyone to have to check out all extensions just to get at trunk. Localization, on the other hand, is entirely separate from development, and has very different needs -- it doesn't need code review, and someone looking at the revision history for the whole repository doesn't want to see localization updates. (Especially in extensions, where often you have to scroll through pages of l10n updates to get to the code changes.)
Unfortunately, git's submodule feature is pretty crippled. It basically works like SVN externals, as I understand it: the larger repository just has markers saying where the submodules are, but their actual history is entirely separate. We could probably write a script to commit changes to all extensions at once, but it's certainly a less ideal solution.
git's submodule feature is something like svn-externals but has a big fundamental difference. svn externals tracks only a repo. so you update you get the latest version of that repo. git submodules tracks a repo and a commit id, always. So when you update you always get the same commit id. Changing that commit id requires making a commit to the git repo to update it. You can also checkout an old commit and submodule update will checkout the commmit id of the submodule that was committed at that point in time. But yes, for both of them it's merely references, they do not store the actual history. They're glorified helper scripts essentially, they don't alleviate the task of downloading each repo separately. They just make the vcs do it for you, instead of you running a script in some other language to do it for you.
In my honest opinion, submodules was not designed for what we are trying to shove into it. And given that one of it's key features (tracking a specific commit id to ensure the same version is always checked out) is actually the opposite of what we want, I believe the actual functionality of git submodules in this situation is no better than what we could build ourself with a few simple custom scripts. In fact I believe we could build something better for our purposes without too much effort. And we could check it into a git repo in place of the repo that submodules would be put in. If you dig through the git discussions I believe I listed a number of features we could add that would make it even more useful. Instead of a second repo, we could just put the tool itself inside mw's repo so that by checking out phase3 you get the tools needed to work with extensions.
If we moved to git, I'd tentatively say something like
- Separate out the version control of localization entirely.
Translations are already coordinated centrally on translatewiki.net, where the wiki itself maintains all the actual history and permissions, so the SVN checkin right now is really a needless formality that keeps translations less up-to-date and spams revision logs. Keep the English messages with the code in git, and have the other messages available for checkout in a different format via our own script. This checkout should always grab the latest translatewiki.net messages, without the need for periodic commits. (I assume translatewiki.net already does automatic syntax checks and so on.) Of course, the tarballs would package all languages.
+1
- Keep the core code in one repository, each extension in a separate
repository, and have an additional repository with all of them as submodules. Or maybe have extensions all be submodules of core (you can check out only a subset of submodules if you want).
- Developers who want to make mass changes to extensions are probably
already doing them by script (at least I always do), so something like "for EXTENSION in extensions/*; do cd $EXTENSION; git commit -a -m 'Boilerplate message'; cd ..; done" shouldn't be an exceptional burden. If it comes up often enough, we can write a script to help out.
- We should take the opportunity to liberalize our policies for
extension hosting. Anyone should be able to add an extension, and get commit access only to that extension. MediaWiki developers would get commit access to all hosted extensions, and hooking into our localization system should be as simple as making sure you have a properly-formatted ExtensionName.i18n.php file. If any human involvement is needed, it should only be basic sanity checks.
I LOVE this idea too, it's been on my mind for awhile.
Brion mentioned that there is some prior art in git farming. Gitorious' codebase is open source. Wikimedia could host a copy of it for the purposes of hosting git repos for MediaWiki and extensions/ Built in management of pubkeys, projects and project repos (Say, MediaWiki, extensions as projects, and some groups of extensions like SMW could be put in one project), teams (put core devs in a team and give them access to the trunk like MediaWiki core repo; we can also add teams like smw-devs that let us open up groups of extensions to groups of people collaborating on them), team clones (make wmf a team and make the wmf branch a clone of the MediaWiki repo for access control), personal clones (so users without access to core can still make a clone, keep it in a place tied with potential code review, and participate by sending merge requests back to core so devs can pick them up and put them in; is this a form of pre-commit review?), and of course the code for letting someone sign up, not have commit to everything, but create their own project repo for an extension and start committing to it.
Oh, as a little bonus. Theoretically we may be able to make some moderate tweaks to Gitorious and build in a simple api that'll list all extensions, as tagged. You can already get something close by using .xml on the project view (since it's a rails app). Using that data we could easily build a tool that would clone all extensions, and from there let you batch commit/push/checkout,branch/updateremote/etc. And we could easily build it to take account of labeling, meaning you could potentially checkout all extensions in TWN, or all extensions tagged as SMW, or all extensions tagged as 'UsedOnWMF'. Naturally of course it would be trivial to make it checkout the repo for an extension by name.
I'd love git being first class and Wikimedia hosted. I'd probably take monaco-port (which is on GitHub right now) and make the repo on Wikimedia the primary repo.
- Code review should migrate to an off-the-shelf tool like Gerrit. I
don't think it's a good idea at all for us to reinvent the code-review wheel. To date we've done it poorly.
This is all assuming that we retain our current basic development model, namely commit-then-review with a centrally-controlled group of people with commit access. One step at a time.
A mixed format might be possible too. Where the bulk of developers can commit to one repo, but we have a second repo for post-review code which is considered to be a more stable trunk. And naturally whatever we do we can make it easier for non-devs to submit code by publishing their own public clone.
On Wed, Mar 23, 2011 at 2:51 PM, Diederik van Lieredvanliere@gmail.com wrote:
The Python Community recently switched to a DVCS and they have documented their choice. It compares Git, Mercurial and Bzr and shows the pluses and minuses of each. In the end, they went for Mercurial.
Choosing a distributed VCS for the Python project: http://www.python.org/dev/peps/pep-0374/
They gave three reasons:
- git's Windows support isn't as good as Mercurial's. I don't know
how much merit that has these days, so it bears investigation. I have the impression that the majority of MediaWiki developers use non-Windows platforms for development, so as long as it works well enough, I don't know if this should be a big deal.
For cli there's mysgit. For gui there is TortoiseGit and gitextensions. I hear comments that TortoiseGit lacks some of gits features, namely interaction with the index. However it's supposed to feel fairly similar to TortoiseSVN (which if we have svn Windows users using a GUI, I expect they're probably using, so that might be helpful). However gitextensions looks fairly interesting, I'm not a Windows user anymore so I haven't looked at it in depth: http://sourceforge.net/projects/gitextensions/
That pep was from a year ago, so git's Windows support can only have gotten better.
- Python developers preferred Mercurial when surveyed. Informally,
I'm pretty certain that most MediaWiki developers with a preference prefer git.
Thanks in part to GitHub, git is definitely as someone else mentioned the 'flavor of the week', though to be fair, in a sense I believe svn was similar to that aspect. I do believe that we are likely to find a lot more MW devs that are comfortable with git than with other dvcs.
- Mercurial is written in Python, and Python developers want to use
stuff written in Python. Not really relevant to us, even those of us who like Python a lot. :) (FWIW, despite being a big Python fan, I'm a bit perturbed that Mercurial often prints out a Python stack trace when it dies instead of a proper error message . . .)
GNOME also surveyed available options, and they decided to go with git:http://blogs.gnome.org/newren/2009/01/03/gnome-dvcs-survey-results/ Although of course, (1) would be a bit of a nonissue for them.