Preliminary git module splitup notes (MediaWiki core & extensions)

List overview All Threads
Download

newer

older

Status of PediaPress and mwlib.rl

WMF Staff Announcement - Welcome...

Brion Vibber

5 Oct 2011 5 Oct '11

9:33 a.m.

I whipped up a few prelim notes on a possible git repository layout for MediaWiki core and extensions, splitting out from SVN:

http://www.mediawiki.org/wiki/Git_conversion/Splitting_tests

I've run this past Siebrand to make sure that it should work for the localization batch commits, and he seems to think it sounds sane.

You can check out extensions as separate repositories directly into subfolders within core's 'extensions' dir for a ready-to-run system. But, you *do* need to do either manually or scripted iteration over them to pull updates or commit across repos. Git's submodules might be a useful way to help automate checkouts, but they introduce their own complications for maintenance.

There's a shell script on that page to make a sample checkout not including any history -- just exporting latest code to make sure the layout makes sense. Real conversion will probably want to include version history and release branches/tags for both core and extensions.

Note that there's lots of other stuff in the MediaWiki subversion repo that will want to be split out as well -- but for now our lives are simplified by only worrying about MediaWiki and maintained extensions. ;)

Please do feel free to comment or ask questions -- things are malleable and feedback will be *very* helpful in finalizing plans and building documentation.

-- brion

Show replies by date

Platonides

5 Oct 5 Oct

10:18 p.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

Brion Vibber wrote:

...

You can check out extensions as separate repositories directly into subfolders within core's 'extensions' dir for a ready-to-run system. But, you *do* need to do either manually or scripted iteration over them to pull updates or commit across repos. Git's submodules might be a useful way to help automate checkouts, but they introduce their own complications for maintenance.

There's a shell script on that page to make a sample checkout not including any history -- just exporting latest code to make sure the layout makes sense. Real conversion will probably want to include version history and release branches/tags for both core and extensions.

There are 615 extensions in trunk/extensions With the new setup, would someone which has a checkout of everything need to open 616 connections (network latency, ssh authentication, etc.) whenever he wants to pull? svn:externals already noticeably slow checkouts, and there's only a bunch of them.

Brion Vibber

6 Oct 6 Oct

2:59 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On Wed, Oct 5, 2011 at 7:18 AM, Platonides Platonides@gmail.com wrote:

...

There are 615 extensions in trunk/extensions With the new setup, would someone which has a checkout of everything need to open 616 connections (network latency, ssh authentication, etc.) whenever he wants to pull? svn:externals already noticeably slow checkouts, and there's only a bunch of them.

I'll do some tests; it may be possible to avoid having to do serial connection setup/teardown by using SSH's ControlMaster[1] setting to piggyback the 'new' connections on one that's already open.

[1] < http://www.anchor.com.au/blog/2010/02/ssh-controlmaster-the-good-the-bad-the...

...

(Part of what makes svn:externals slow is that you're jumping over to another site entirely -- that's an extra DNS lookup, possibly separate auth, and who knows how well the other server performs.)

-- brion

Daniel Friesen

3:41 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

11-10-05 11:59 AM, Brion Vibber wrote:

...

On Wed, Oct 5, 2011 at 7:18 AM, Platonides Platonides@gmail.com wrote:

...
There are 615 extensions in trunk/extensions With the new setup, would someone which has a checkout of everything need to open 616 connections (network latency, ssh authentication, etc.) whenever he wants to pull? svn:externals already noticeably slow checkouts, and there's only a bunch of them.

I'll do some tests; it may be possible to avoid having to do serial connection setup/teardown by using SSH's ControlMaster[1] setting to piggyback the 'new' connections on one that's already open.

[1] < http://www.anchor.com.au/blog/2010/02/ssh-controlmaster-the-good-the-bad-the... (Part of what makes svn:externals slow is that you're jumping over to another site entirely -- that's an extra DNS lookup, possibly separate auth, and who knows how well the other server performs.)

-- brion

;) And when all else fails, rsync a bunch of --bare repos and do your pull on the local hd.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Platonides

4:20 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

Daniel Friesen wrote:

...

;) And when all else fails, rsync a bunch of --bare repos and do your pull on the local hd.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Sorry? Note that only a small subset of those with commit access do have full ssh to run commands. And a good system shouldn't need those hacks (which is why we are discussing in advance).

Ryan Lane

4:27 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

...

Sorry? Note that only a small subset of those with commit access do have full ssh to run commands. And a good system shouldn't need those hacks (which is why we are discussing in advance).

I don't see why we can't open up access to basically everyone. I'd like to make it so that everyone who makes a Labs account also gets access via SSH. I'd like Labs account access to be fairly open.

We can configure Gerrit to only allow read access to people with an account. I'd really rather that everyone with an account has the ability to push a merge request and make branches as well, though.

- Ryan

Platonides

4:06 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

Brion Vibber wrote:

...

On Wed, Oct 5, 2011 at 7:18 AM, PlatonidesPlatonides@gmail.com wrote:

...
There are 615 extensions in trunk/extensions With the new setup, would someone which has a checkout of everything need to open 616 connections (network latency, ssh authentication, etc.) whenever he wants to pull? svn:externals already noticeably slow checkouts, and there's only a bunch of them.

I'll do some tests; it may be possible to avoid having to do serial connection setup/teardown by using SSH's ControlMaster[1] setting to piggyback the 'new' connections on one that's already open.

I know about ControlMaster (which we can only those of us with ssh+git can benefit), but just launching a new process and waiting if there's something new will slow-down. OTOH git skips the "recurse everything locking all subfolders" step, so it may be equivalent.

Maybe there's some way for fetching updates from aggregate repositories at once and I am just when everything is solved, though.

Brion Vibber

4:59 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On Oct 5, 2011 1:03 PM, "Platonides" Platonides@gmail.com wrote:

...

I know about ControlMaster (which we can only those of us with ssh+git can benefit), but just launching a new process and waiting if there's something new will slow-down. OTOH git skips the "recurse everything locking all subfolders" step, so it may be equivalent.

Maybe there's some way for fetching updates from aggregate repositories at once and I am just when everything is solved, though.

Submodules may actually work well for this, as long as something propagates the ext updates to the composite repo. The checked-out commit id of each submodule is stored in the tree, so if no changes were seen from that one containing repo it shouldn't have to pull anything from the submodule's repo.

(Not yet tested)

-- brion

...

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Brion Vibber

5:30 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On Wed, Oct 5, 2011 at 1:59 PM, Brion Vibber brion@pobox.com wrote:

...

On Oct 5, 2011 1:03 PM, "Platonides" Platonides@gmail.com wrote:

...
I know about ControlMaster (which we can only those of us with ssh+git can benefit), but just launching a new process and waiting if there's something new will slow-down. OTOH git skips the "recurse everything locking all subfolders" step, so it may be equivalent.

Maybe there's some way for fetching updates from aggregate repositories at once and I am just when everything is solved, though.

Submodules may actually work well for this, as long as something propagates the ext updates to the composite repo. The checked-out commit id of each submodule is stored in the tree, so if no changes were seen from that one containing repo it shouldn't have to pull anything from the submodule's repo.

(Not yet tested)

Ok, did some quick tests fetching updates for 16 repos sitting on my Gitorious account.

Ping round-trip from my office desktop to Gitorious's server is 173ms, making the theoretical *absolute best* possible time involving a round-trip for each at 2-3 seconds.

Running a simple loop of 'git fetch' over each repo (auth'ing with my ssh key, passphrase already provided) takes 53 seconds (about 3 seconds per repo). This does a separate ssh setup & poke into git for each repo.

Clearly unacceptable for 600+ extensions. :)

Turning on ControlMaster and starting a long-running git clone in the background, then running the same 'git fetch' loop took the time down to about 10 seconds (<1s per repo). ControlMaster lets those looped 'git fetch's piggyback on the existing SSH connection, but still has to start up git and run several round-trips.

Better, but still doesn't scale to hundreds of extensions: several minutes for a null update is too frustrating!

Checking them out as submodules via 'git submodule add' and then issuing a single 'git submodule update' command takes... 0.15 seconds. Nice!

Looks like it does indeed see that there's no changes, so nothing has to be pulled from the upstream repos. Good!

The downside is that maintaining submodules means constantly pushing commits to the containing repo so it knows there are updates. :(

Probably the most user-friendly way to handle this is with a wrapper script that can do a single query to fetch the current branch head positions of a bunch of repos, then does fetch/pull only on the ones that have changed.

This could still end up pulling from 600+ repos -- if there are actually changes in them all! -- but should make typical cases a *lot* faster.

We should check in a little more detail how Android & other big projects using multiple git repos are doing their helper tools to see if we can just use something that already does this or if we have to build it ourselves. :)

-- brion

Happy Melon

6:07 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 5 October 2011 22:30, Brion Vibber brion@pobox.com wrote:

...

On Wed, Oct 5, 2011 at 1:59 PM, Brion Vibber brion@pobox.com wrote:

This could still end up pulling from 600+ repos -- if there are actually changes in them all! -- but should make typical cases a *lot* faster.

Pushed localisation updates from translatewiki would produce precisely this effect (minor but nonzero changes to hundreds of repos) on a daily or at least weekly basis. :-(

--HM

Siebrand Mazeland

6:12 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

Op 5 okt. 2011 om 15:07 heeft Happy Melon happy.melon.wiki@gmail.com het volgende geschreven:

...

On 5 October 2011 22:30, Brion Vibber brion@pobox.com wrote:

...
On Wed, Oct 5, 2011 at 1:59 PM, Brion Vibber brion@pobox.com wrote:

This could still end up pulling from 600+ repos -- if there are actually changes in them all! -- but should make typical cases a *lot* faster.

Pushed localisation updates from translatewiki would produce precisely this effect (minor but nonzero changes to hundreds of repos) on a daily or at least weekly basis. :-(

Actually, we plan to start pushing updates automagically every day, so that will be a reality.

Is that ssh session share thingy also available for Windows users?

Siebrand

6:41 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On Wed, Oct 5, 2011 at 5:12 PM, Siebrand Mazeland s.mazeland@xs4all.nl wrote:

...

Is that ssh session share thingy also available for Windows users?

If you use one of the windows clients that support it[1], I would assume so.

1 - http://en.wikipedia.org/wiki/Comparison_of_SSH_clients#Technical

Merlijn van Deen

3:06 p.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 6 October 2011 00:12, Siebrand Mazeland s.mazeland@xs4all.nl wrote:

...

Is that ssh session share thingy also available for Windows users?

As Windows users generally use putty (although I'm not 100% sure about git-mingw, in this respect, but it works at least with tortoisesvn, and probably also with tortoisegit): pageant is what you're looking for. It's available from the putty website [1].

Best, Merlijn

[1] http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

Conrad Irwin

6:12 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On Wed, Oct 5, 2011 at 3:07 PM, Happy Melon happy.melon.wiki@gmail.comwrote:

...

On 5 October 2011 22:30, Brion Vibber brion@pobox.com wrote:

...
On Wed, Oct 5, 2011 at 1:59 PM, Brion Vibber brion@pobox.com wrote:

This could still end up pulling from 600+ repos -- if there are actually changes in them all! -- but should make typical cases a *lot* faster.

What's the use-case for checking out all of the extensions? Presumably that would only be needed if you wanted to do some global clean-up, so this won't affect the majority of people too extremely.

Conrad

Max Semenik

7 Oct 7 Oct

12:07 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 06.10.2011, 2:12 Conrad Irwin wrote:

...

What's the use-case for checking out all of the extensions? Presumably that would only be needed if you wanted to do some global clean-up, so this won't affect the majority of people too extremely.

Actually, it's pretty common, most if not all active core devs do that because every change to core APIs must be accompanied with a grep through extensions to check if it's OK or something will need to be fixed.

-- Best regards, Max Semenik ([[User:MaxSem]])

MZMcBride

12:59 p.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

Max Semenik wrote:

...

On 06.10.2011, 2:12 Conrad Irwin wrote:

...
What's the use-case for checking out all of the extensions? Presumably that would only be needed if you wanted to do some global clean-up, so this won't affect the majority of people too extremely.

Actually, it's pretty common, most if not all active core devs do that because every change to core APIs must be accompanied with a grep through extensions to check if it's OK or something will need to be fixed.

s/grep/ack-grep/ ;-)

MZMcBride

Daniel Friesen

6 Oct 6 Oct

3:55 p.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 11-10-05 03:07 PM, Happy Melon wrote:

...

On 5 October 2011 22:30, Brion Vibber brion@pobox.com wrote:

...
On Wed, Oct 5, 2011 at 1:59 PM, Brion Vibber brion@pobox.com wrote:

This could still end up pulling from 600+ repos -- if there are actually changes in them all! -- but should make typical cases a *lot* faster.

Pushed localisation updates from translatewiki would produce precisely this effect (minor but nonzero changes to hundreds of repos) on a daily or at least weekly basis. :-(

--HM

I wish we could stop pushing TWN updates as code commits. I'm getting tired of trying to look through an extension's log for changes and having to sort through it where 3/4 of the commits are TWN updates.

Someone mentioned an idea at one point of having only English in core (as the source for translations) and then pulling localization updates in with some script, part of building the tarballs would be pulling in the localizations and bundling them.

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Ashar Voultoiz

9 Oct 9 Oct

5:17 p.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 06/10/11 09:55, Daniel Friesen wrote:

...

I wish we could stop pushing TWN updates as code commits. I'm getting tired of trying to look through an extension's log for changes and having to sort through it where 3/4 of the commits are TWN updates.

Someone mentioned an idea at one point of having only English in core (as the source for translations) and then pulling localization updates in with some script, part of building the tarballs would be pulling in the localizations and bundling them.

That might be me, or at least I have the same concern about logs specially when bisecting changes.

Much like we moved extension out of core, we might want to have translations moved out too. We do not really need 0day translations for day to day MW hacking.

Having all translations (core+ext) in one repository will makes life easier for the translation team and developers maintaining their extension outside of our repository will be able to take advantage of the translation system.

To update live site, we will just have to fetch the change from that repository and merge that to the wmf branches. To release a new tarball, we could fetch the latest revision from the translation repository.

-- Ashar Voultoiz -- Ashar Voultoiz

Platonides

10 Oct 10 Oct

2:20 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

Ashar Voultoiz wrote:

...

That might be me, or at least I have the same concern about logs specially when bisecting changes.

Much like we moved extension out of core, we might want to have translations moved out too. We do not really need 0day translations for day to day MW hacking.

What about commits which add a message? Most i18n commits are irrelevant for normal coding. But there are code commits which carry along i18n changes.

And I don't like solutions like keeping English at a different repo, or having to split a commit in two independent repositories. Maybe subrepositories can help here?

Daniel Friesen

2:52 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 11-10-09 11:20 AM, Platonides wrote:

...

Ashar Voultoiz wrote:

...
That might be me, or at least I have the same concern about logs specially when bisecting changes.

Much like we moved extension out of core, we might want to have translations moved out too. We do not really need 0day translations for day to day MW hacking.

What about commits which add a message? Most i18n commits are irrelevant for normal coding. But there are code commits which carry along i18n changes.

Commits adding messages aren't generally done by TWN so those are of course really code changes which would be committed. The few times TWN might actually change an en message I don't mind seeing it. It won't be like the mass of non-en stuff that's in the way when browsing logs.

...

And I don't like solutions like keeping English at a different repo, or having to split a commit in two independent repositories. Maybe subrepositories can help here?

A submodule can only handle entire dirs. I don't know about putting the canonical English in another repo, then it's not simply twn occasionally crossing repos, it's every single developer making any change that happens to also add a new message or modify a message crossing two repos to change the canonical form. Piles of non-automated commits will need to double commit and include a submodule commit id change in the parent repo's commit.

What I've been thinking isn't so much putting translations in another repo, or even TWN doing any commits anywhere at all. Frankly the whole idea of TWN reading and writing .php files has felt completely messed up to me anyways. Sure our canonical message forms can be in .php, but having the semi-automated system we use to translate to every other language we support output php files feels like a relic of a time before it existed and a band-aid hack just to make it possible for TWN to do translations back then. I'd like to make TWN the proper source for all the translations. Rather than TWN spitting out php for non-en, we have a proper generated output format for translations, and MediaWiki uses that instead of .php for our translations. Instead of TWN having to make this a commit somewhere, I think we should pull those translations right from TWN once we need them.

If we make an out-of-repo pull for i18n our preferred way of getting i18n instead of having it pushed to core as commits we can also make the method of getting absolute up-to-date translations even on non-trunk versions a non-hack.

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Niklas Laxström

5:33 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 9 October 2011 21:52, Daniel Friesen lists@nadir-seen-fire.com wrote:

...

On 11-10-09 11:20 AM, Platonides wrote:

...

What I've been thinking isn't so much putting translations in another repo, or even TWN doing any commits anywhere at all. Frankly the whole idea of TWN reading and writing .php files has felt completely messed up to me anyways. Sure our canonical message forms can be in .php, but having the semi-automated system we use to translate to every other language we support output php files feels like a relic of a time before it existed and a band-aid hack just to make it possible for TWN to do translations back then.

Huge +1. I would sincerely welcome a move away from PHP-based i18n files. Having data in executable format is just stupid imho.

...

I'd like to make TWN the proper source for all the translations. Rather than TWN spitting out php for non-en, we have a proper generated output format for translations, and MediaWiki uses that instead of .php for our translations. Instead of TWN having to make this a commit somewhere, I think we should pull those translations right from TWN once we need them.

I'm not sure I want to add that burden to TWN right now. It's just single vserver with no uptime guarantees. I'm not opposed to the idea though - having efficient l10n update in the core, enabled by default, providing always up-to-date translations and perhaps also loading new languages on demand[1] would soo awesome. But like I said, that would need some serious effort to code and to make stable and secure content distribution channel. Any volunteers? :)

[1] This would also satisfy those who think that including all l10n makes the tarball too big

-Niklas

-- Niklas Laxström

Daniel Friesen

5:40 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 11-10-09 02:33 PM, Niklas Laxström wrote:

...

On 9 October 2011 21:52, Daniel Friesen lists@nadir-seen-fire.com wrote:

...
On 11-10-09 11:20 AM, Platonides wrote: What I've been thinking isn't so much putting translations in another repo, or even TWN doing any commits anywhere at all. Frankly the whole idea of TWN reading and writing .php files has felt completely messed up to me anyways. Sure our canonical message forms can be in .php, but having the semi-automated system we use to translate to every other language we support output php files feels like a relic of a time before it existed and a band-aid hack just to make it possible for TWN to do translations back then.

Huge +1. I would sincerely welcome a move away from PHP-based i18n files. Having data in executable format is just stupid imho.

...
I'd like to make TWN the proper source for all the translations. Rather than TWN spitting out php for non-en, we have a proper generated output format for translations, and MediaWiki uses that instead of .php for our translations. Instead of TWN having to make this a commit somewhere, I think we should pull those translations right from TWN once we need them.

I'm not sure I want to add that burden to TWN right now. It's just single vserver with no uptime guarantees. I'm not opposed to the idea though - having efficient l10n update in the core, enabled by default, providing always up-to-date translations and perhaps also loading new languages on demand[1] would soo awesome. But like I said, that would need some serious effort to code and to make stable and secure content distribution channel. Any volunteers? :)

I thought of that too. svn.wikimedia.org already bears that burden anyways. I don't see why WMF couldn't offer up some spot that TWN can just push the files. Or maybe just reverse proxy it.

Or maybe labs. When I talked to Ryan about Gerrit vs. Gitorious he mentioned that using the labs setup that was being put together I could even put together gitorious and have it pushed to production from labs. If that setup can handle it, then perhaps we can use that to setup a spot TWN can push updates to.

...

[1] This would also satisfy those who think that including all l10n makes the tarball too big

-Niklas

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Happy Melon

5:48 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 9 October 2011 22:33, Niklas Laxström niklas.laxstrom@gmail.com wrote:

...

On 9 October 2011 21:52, Daniel Friesen lists@nadir-seen-fire.com wrote:

...
On 11-10-09 11:20 AM, Platonides wrote:

...
What I've been thinking isn't so much putting translations in another repo, or even TWN doing any commits anywhere at all. Frankly the whole idea of TWN reading and writing .php files has felt completely messed up to me anyways. Sure our canonical message forms can be in .php, but having the semi-automated system we use to translate to every other language we support output php files feels like a relic of a time before it existed and a band-aid hack just to make it possible for TWN to do translations back then.

Huge +1. I would sincerely welcome a move away from PHP-based i18n files. Having data in executable format is just stupid imho.

I don't really see how changing the format is going to have any impact by itself. Whether PHP, XML, a hand-rolled data format or anything else, it still doesn't play nicely with version control. Fundamentally we want to make changes to content in a version-controlled project, and we want everyone to have the latest versions of those changes; but we don't want the version history *of the i18n content* mixed in with the version history of the rest of the project. The solution to that issue is obvious and has nothing to do with the file format: if you don't want your changes showing up in the version history of your repository, make your changes outside the repository!

...

...
I'd like to make TWN the proper source for all the translations. Rather than TWN spitting out php for non-en, we have a proper generated output format for translations, and MediaWiki uses that instead of .php for our translations. Instead of TWN having to make this a commit somewhere, I think we should pull those translations right from TWN once we need them.

I'm not sure I want to add that burden to TWN right now. It's just single vserver with no uptime guarantees. I'm not opposed to the idea though - having efficient l10n update in the core, enabled by default, providing always up-to-date translations and perhaps also loading new languages on demand[1] would soo awesome. But like I said, that would need some serious effort to code and to make stable and secure content distribution channel. Any volunteers? :)

This would seem something of a slap in the face to anyone running an 'unplugged' wiki (on an intranet or low-connectivity area); *especially* one using a language other than English. I doubt this would play very happily with the Foundation's vision of bringing offline content to, for instance, rural Africa. Not all MediaWiki installations run on a webserver permanently hooked into the internet; some of them run from a USB stick plugged into an OLOC laptop in the middle of the middle of nowhere.

--HM

Daniel Friesen

6:04 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 11-10-09 02:48 PM, Happy Melon wrote:

...

On 9 October 2011 22:33, Niklas Laxström niklas.laxstrom@gmail.com wrote:

...
On 9 October 2011 21:52, Daniel Friesen lists@nadir-seen-fire.com wrote:

...
On 11-10-09 11:20 AM, Platonides wrote: What I've been thinking isn't so much putting translations in another repo, or even TWN doing any commits anywhere at all. Frankly the whole idea of TWN reading and writing .php files has felt completely messed up to me anyways. Sure our canonical message forms can be in .php, but having the semi-automated system we use to translate to every other language we support output php files feels like a relic of a time before it existed and a band-aid hack just to make it possible for TWN to do translations back then.

Huge +1. I would sincerely welcome a move away from PHP-based i18n files. Having data in executable format is just stupid imho.

I don't really see how changing the format is going to have any impact by itself. Whether PHP, XML, a hand-rolled data format or anything else, it still doesn't play nicely with version control. Fundamentally we want to make changes to content in a version-controlled project, and we want everyone to have the latest versions of those changes; but we don't want the version history *of the i18n content* mixed in with the version history of the rest of the project. The solution to that issue is obvious and has nothing to do with the file format: if you don't want your changes showing up in the version history of your repository, make your changes outside the repository!

That IS what we're discussing. We're also discussing how ridiculous it is to be using executable files for our output format. Using .php as our output format means that whatever out-of-band download we use to separate i18n from the repository could be a vector to inject php from. If we drop php and go to some other format, then that part of the issue goes away. Nikerabbit also points out that leaving the repo behind also means we don't have to care about what the output format looks like anymore (likely part of the reason why it's still php).

...

...
...
I'd like to make TWN the proper source for all the translations. Rather than TWN spitting out php for non-en, we have a proper generated output format for translations, and MediaWiki uses that instead of .php for our translations. Instead of TWN having to make this a commit somewhere, I think we should pull those translations right from TWN once we need them.

I'm not sure I want to add that burden to TWN right now. It's just single vserver with no uptime guarantees. I'm not opposed to the idea though - having efficient l10n update in the core, enabled by default, providing always up-to-date translations and perhaps also loading new languages on demand[1] would soo awesome. But like I said, that would need some serious effort to code and to make stable and secure content distribution channel. Any volunteers? :)

This would seem something of a slap in the face to anyone running an 'unplugged' wiki (on an intranet or low-connectivity area); *especially* one using a language other than English. I doubt this would play very happily with the Foundation's vision of bringing offline content to, for instance, rural Africa. Not all MediaWiki installations run on a webserver permanently hooked into the internet; some of them run from a USB stick plugged into an OLOC laptop in the middle of the middle of nowhere.

The foundation's vision has little to do with our development process. Whether i18n is inside our version control or some other place it'll still be inside the release tarballs. And really, OLOC (OLPC?) running full MediaWiki? I thought that plan was zim based or something.

...

--HM

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Max Semenik

12 Oct 12 Oct

3:21 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 10.10.2011, 2:04 Daniel wrote:

...

...
I don't really see how changing the format is going to have any impact by itself. Whether PHP, XML, a hand-rolled data format or anything else, it still doesn't play nicely with version control. Fundamentally we want to make changes to content in a version-controlled project, and we want everyone to have the latest versions of those changes; but we don't want the version history *of the i18n content* mixed in with the version history of the rest of the project. The solution to that issue is obvious and has nothing to do with the file format: if you don't want your changes showing up in the version history of your repository, make your changes outside the repository!

That IS what we're discussing. We're also discussing how ridiculous it is to be using executable files for our output format. Using .php as our output format means that whatever out-of-band download we use to separate i18n from the repository could be a vector to inject php from. If we drop php and go to some other format, then that part of the issue goes away. Nikerabbit also points out that leaving the repo behind also means we don't have to care about what the output format looks like anymore (likely part of the reason why it's still php).

If you can't trust your on-the-fly .php downloads, you can't trust localisations in other formats either: we have some messages with full HTML allowed, and default contents for JS messages can be used to screw up the wiki and steal everyone's passwords. Of course, it's less lethal than server-side php execution, but still an unacceptably dangerous attack vector.

-- Best regards, Max Semenik ([[User:MaxSem]])

Daniel Friesen

4:48 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 11-10-11 12:21 PM, Max Semenik wrote:

...

On 10.10.2011, 2:04 Daniel wrote:

...
...
I don't really see how changing the format is going to have any impact by itself. Whether PHP, XML, a hand-rolled data format or anything else, it still doesn't play nicely with version control. Fundamentally we want to make changes to content in a version-controlled project, and we want everyone to have the latest versions of those changes; but we don't want the version history *of the i18n content* mixed in with the version history of the rest of the project. The solution to that issue is obvious and has nothing to do with the file format: if you don't want your changes showing up in the version history of your repository, make your changes outside the repository!

That IS what we're discussing. We're also discussing how ridiculous it is to be using executable files for our output format. Using .php as our output format means that whatever out-of-band download we use to separate i18n from the repository could be a vector to inject php from. If we drop php and go to some other format, then that part of the issue goes away. Nikerabbit also points out that leaving the repo behind also means we don't have to care about what the output format looks like anymore (likely part of the reason why it's still php).

If you can't trust your on-the-fly .php downloads, you can't trust localisations in other formats either: we have some messages with full HTML allowed, and default contents for JS messages can be used to screw up the wiki and steal everyone's passwords. Of course, it's less lethal than server-side php execution, but still an unacceptably dangerous attack vector.

We can kill html messages over time (and it would definitely be a good idea anyways). Messages for .js can be rejected from the localized downloads. And frankly killing the inclusion of .js and .css from the i18n space would be good too.

Principle of least permission, if we have to download i18n then at the very least we shouldn't allow that format to inject php and turn everyone's vps servers into a big botnet.

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Platonides

5:58 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

Daniel Friesen wrote:

...

We can kill html messages over time (and it would definitely be a good idea anyways). Messages for .js can be rejected from the localized downloads. And frankly killing the inclusion of .js and .css from the i18n space would be good too.

Principle of least permission, if we have to download i18n then at the very least we shouldn't allow that format to inject php and turn everyone's vps servers into a big botnet.

Man, that kills all the fun :)

Also, that means that we have an excuse for holing those pesty firewalls that don't allow us to "enable" their site [1].

1- http://bit.ly/qiYUyL

Ashar Voultoiz

10 Oct 10 Oct

3:50 a.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 09/10/11 20:20, Platonides wrote:

...

What about commits which add a message?

Since English is our reference language, we would keep it in core. If you wanted to provide other translations (Portuguese for you, or French for me), it would need to be done in the i18n repository.

...

Most i18n commits are irrelevant for normal coding. But there are code commits which carry along i18n changes.

Those i18n commits would have to be done on the new repository.

...

And I don't like solutions like keeping English at a different repo, or having to split a commit in two independent repositories. Maybe subrepositories can help here?

Per above, we should keep English in the core repository.

-- Ashar Voultoiz

Victor Vasiliev

7 Oct 7 Oct

1:43 p.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On Wed, Oct 5, 2011 at 5:33 AM, Brion Vibber brion@pobox.com wrote:

...

You can check out extensions as separate repositories directly into subfolders within core's 'extensions' dir for a ready-to-run system. But, you *do* need to do either manually or scripted iteration over them to pull updates or commit across repos. Git's submodules might be a useful way to help automate checkouts, but they introduce their own complications for maintenance.

That does not sound like The Bright Git Future.

--vvv

Daniel Friesen

7:48 p.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 11-10-06 10:43 PM, Victor Vasiliev wrote:

...

On Wed, Oct 5, 2011 at 5:33 AM, Brion Vibber brion@pobox.com wrote:

...
You can check out extensions as separate repositories directly into subfolders within core's 'extensions' dir for a ready-to-run system. But, you *do* need to do either manually or scripted iteration over them to pull updates or commit across repos. Git's submodules might be a useful way to help automate checkouts, but they introduce their own complications for maintenance.

That does not sound like The Bright Git Future.

--vvv

;) No, "The Bright Git Future" is when I can commit from my server, pull the changes to my local working copy, and push them to the central repo from there. Since I develop on my servers, but don't trust them with my private keys.

I currently do this with absolute hacks involving ssh up on both working copies, piping svn diff through ssh into patch, commit, then another svn up. The fact I have unfinished code lying around in my working copies just makes things even more fun (I always make use of git's lovely index which lets me pick piece by piece what parts of the diff to actually commit). Not to mention that the svn diff trick has issues if I have a new file.

This of course leads to lovely commits like: This: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/96668 And this: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/96273 Humorous ones like this: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/97180 ^_^ And this kind of lovely commit: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/85242

By the way, when we switch to get I'll finally be able to get rid of half the reason I make some commits without bothering to test them ;). Since it will no longer be a huge hassle to make the change in a place I can actually test it.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Victor Vasiliev

8:27 p.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On Fri, Oct 7, 2011 at 3:48 PM, Daniel Friesen lists@nadir-seen-fire.com wrote:

...

I currently do this with absolute hacks involving ssh up on both working copies, piping svn diff through ssh into patch, commit, then another svn up. The fact I have unfinished code lying around in my working copies just makes things even more fun (I always make use of git's lovely index which lets me pick piece by piece what parts of the diff to actually commit). Not to mention that the svn diff trick has issues if I have a new file.

I also use two working copies (one read-only for development and one for commiting). This is mostly due to lack of svn stash and local branches (I have at least two large patches on my local copy I'm forced to manually unmerge by editing svn diffs).

--vvv

Ashar Voultoiz

9 Oct 9 Oct

5:08 p.m.

New subject: Preliminary git module splitup notes (MediaWiki core & extensions)

On 07/10/11 14:27, Victor Vasiliev wrote:

...

I also use two working copies (one read-only for development and one for commiting). This is mostly due to lack of svn stash and local branches (I have at least two large patches on my local copy I'm forced to manually unmerge by editing svn diffs).

You could git init your existing repository to have a git stash on top of your existing svn working copie. Whenever you want to svn update: stash your changes with git, svn update, git commit, unstash and merge :)

-- Ashar Voultoiz

4824

Age (days ago)

4830

Last active (days ago)

wikitech-l@lists.wikimedia.org

31 comments

14 participants

tags (0)

participants (14)

Ashar Voultoiz
Brion Vibber
Conrad Irwin
Daniel Friesen
Happy Melon
Max Semenik
Merlijn van Deen
MZMcBride
Niklas Laxström
OQ
Platonides
Ryan Lane
Siebrand Mazeland
Victor Vasiliev