On 11-03-24 06:12 PM, Aryeh Gregor wrote:
On Tue, Mar 22, 2011 at 10:46 PM, Tim
Starling<tstarling(a)wikimedia.org> wrote:
If we split up the extensions directory, each
extension having its own
repository, then this will discourage developers from updating the
extensions in bulk. This affects both interface changes and general
code maintenance. I'm sure
translatewiki.net can set up a script to do
the necessary 400 commits per day, but I'm not sure if every developer
who wants to fix unused variables or change a core/extension interface
will want to do the same.
I've thought about this a bit. We want bulk code
changes to
extensions to be easy, but it would also be nice if it were easier to
host extensions "officially" to get translations, distribution, and
help from established developers. We also don't want anyone to have
to check out all extensions just to get at trunk. Localization, on
the other hand, is entirely separate from development, and has very
different needs -- it doesn't need code review, and someone looking at
the revision history for the whole repository doesn't want to see
localization updates. (Especially in extensions, where often you have
to scroll through pages of l10n updates to get to the code changes.)
Unfortunately, git's submodule feature is pretty crippled. It
basically works like SVN externals, as I understand it: the larger
repository just has markers saying where the submodules are, but their
actual history is entirely separate. We could probably write a script
to commit changes to all extensions at once, but it's certainly a less
ideal solution.
git's submodule feature is something like svn-externals but has
a big
fundamental difference.
svn externals tracks only a repo. so you update you get the latest
version of that repo.
git submodules tracks a repo and a commit id, always. So when you update
you always get the same commit id. Changing that commit id requires
making a commit to the git repo to update it. You can also checkout an
old commit and submodule update will checkout the commmit id of the
submodule that was committed at that point in time.
But yes, for both of them it's merely references, they do not store the
actual history. They're glorified helper scripts essentially, they don't
alleviate the task of downloading each repo separately. They just make
the vcs do it for you, instead of you running a script in some other
language to do it for you.
In my honest opinion, submodules was not designed for what we are trying
to shove into it. And given that one of it's key features (tracking a
specific commit id to ensure the same version is always checked out) is
actually the opposite of what we want, I believe the actual
functionality of git submodules in this situation is no better than what
we could build ourself with a few simple custom scripts. In fact I
believe we could build something better for our purposes without too
much effort. And we could check it into a git repo in place of the repo
that submodules would be put in. If you dig through the git discussions
I believe I listed a number of features we could add that would make it
even more useful. Instead of a second repo, we could just put the tool
itself inside mw's repo so that by checking out phase3 you get the tools
needed to work with extensions.
If we moved to git, I'd tentatively say something
like
* Separate out the version control of localization entirely.
Translations are already coordinated centrally on
translatewiki.net,
where the wiki itself maintains all the actual history and
permissions, so the SVN checkin right now is really a needless
formality that keeps translations less up-to-date and spams revision
logs. Keep the English messages with the code in git, and have the
other messages available for checkout in a different format via our
own script. This checkout should always grab the latest
translatewiki.net messages, without the need for periodic commits. (I
assume
translatewiki.net already does automatic syntax checks and so
on.) Of course, the tarballs would package all languages.
+1
* Keep the core code in one repository, each extension
in a separate
repository, and have an additional repository with all of them as
submodules. Or maybe have extensions all be submodules of core (you
can check out only a subset of submodules if you want).
* Developers who want to make mass changes to extensions are probably
already doing them by script (at least I always do), so something like
"for EXTENSION in extensions/*; do cd $EXTENSION; git commit -a -m
'Boilerplate message'; cd ..; done" shouldn't be an exceptional
burden. If it comes up often enough, we can write a script to help
out.
* We should take the opportunity to liberalize our
policies for
extension hosting. Anyone should be able to add an extension, and get
commit access only to that extension. MediaWiki developers would get
commit access to all hosted extensions, and hooking into our
localization system should be as simple as making sure you have a
properly-formatted ExtensionName.i18n.php file. If any human
involvement is needed, it should only be basic sanity checks.
I LOVE this idea too,
it's been on my mind for awhile.
Brion mentioned that there is some prior art in git farming. Gitorious'
codebase is open source. Wikimedia could host a copy of it for the
purposes of hosting git repos for MediaWiki and extensions/
Built in management of pubkeys, projects and project repos (Say,
MediaWiki, extensions as projects, and some groups of extensions like
SMW could be put in one project), teams (put core devs in a team and
give them access to the trunk like MediaWiki core repo; we can also add
teams like smw-devs that let us open up groups of extensions to groups
of people collaborating on them), team clones (make wmf a team and make
the wmf branch a clone of the MediaWiki repo for access control),
personal clones (so users without access to core can still make a clone,
keep it in a place tied with potential code review, and participate by
sending merge requests back to core so devs can pick them up and put
them in; is this a form of pre-commit review?), and of course the code
for letting someone sign up, not have commit to everything, but create
their own project repo for an extension and start committing to it.
Oh, as a little bonus. Theoretically we may be able to make some
moderate tweaks to Gitorious and build in a simple api that'll list all
extensions, as tagged. You can already get something close by using .xml
on the project view (since it's a rails app). Using that data we could
easily build a tool that would clone all extensions, and from there let
you batch commit/push/checkout,branch/updateremote/etc. And we could
easily build it to take account of labeling, meaning you could
potentially checkout all extensions in TWN, or all extensions tagged as
SMW, or all extensions tagged as 'UsedOnWMF'. Naturally of course it
would be trivial to make it checkout the repo for an extension by name.
I'd love git being first class and Wikimedia hosted. I'd probably take
monaco-port (which is on GitHub right now) and make the repo on
Wikimedia the primary repo.
* Code review should migrate to an off-the-shelf tool
like Gerrit. I
don't think it's a good idea at all for us to reinvent the code-review
wheel. To date we've done it poorly.
This is all assuming that we retain our current basic development
model, namely commit-then-review with a centrally-controlled group of
people with commit access. One step at a time.
A mixed format might be possible
too. Where the bulk of developers can
commit to one repo, but we have a second repo for post-review code which
is considered to be a more stable trunk. And naturally whatever we do we
can make it easier for non-devs to submit code by publishing their own
public clone.
On Wed, Mar 23, 2011 at 2:51 PM, Diederik van
Liere<dvanliere(a)gmail.com> wrote:
The Python Community recently switched to a DVCS
and they have
documented their choice.
It compares Git, Mercurial and Bzr and shows the pluses and minuses of
each. In the end, they went for Mercurial.
Choosing a distributed VCS for the Python project:
http://www.python.org/dev/peps/pep-0374/ They gave three reasons:
1) git's Windows support isn't as good as Mercurial's. I don't know
how much merit that has these days, so it bears investigation. I have
the impression that the majority of MediaWiki developers use
non-Windows platforms for development, so as long as it works well
enough, I don't know if this should be a big deal.
For cli there's mysgit.
For gui there is TortoiseGit and gitextensions.
I hear comments that TortoiseGit lacks some of gits features, namely
interaction with the index. However it's supposed to feel fairly similar
to TortoiseSVN (which if we have svn Windows users using a GUI, I expect
they're probably using, so that might be helpful). However gitextensions
looks fairly interesting, I'm not a Windows user anymore so I haven't
looked at it in depth:
http://sourceforge.net/projects/gitextensions/
That pep was from a year ago, so git's Windows support can only have
gotten better.
2) Python developers preferred Mercurial when
surveyed. Informally,
I'm pretty certain that most MediaWiki developers with a preference
prefer git.
Thanks in part to GitHub, git is definitely as someone else mentioned
the 'flavor of the week', though to be fair, in a sense I believe svn
was similar to that aspect. I do believe that we are likely to find a
lot more MW devs that are comfortable with git than with other dvcs.
3) Mercurial is written in Python, and Python
developers want to use
stuff written in Python. Not really relevant to us, even those of us
who like Python a lot. :) (FWIW, despite being a big Python fan, I'm
a bit perturbed that Mercurial often prints out a Python stack trace
when it dies instead of a proper error message . . .)
GNOME also surveyed available options, and they decided to go with
git:<http://blogs.gnome.org/newren/2009/01/03/gnome-dvcs-survey-results/…
Although of course, (1) would be a bit of a nonissue for them.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://daniel.friesen.name]