Lately I've seen a lot of diffs to .i18n files that look like this:
- (50 lines with one number of spaces in them) + (51 lines with a different number of spaces in them, and one of the lines is new)
I think it would greatly simplify change tracking if we used more consistent spacing in the localization files, eg changing this:
+ 'Search' => array( 'Sichen' ), + 'Resetpass' => array( 'Passwuert zrécksetzen' ), to this:
+ 'Search' => array( 'Sichen' ), + 'Resetpass' => array( 'Passwuert zrécksetzen' ),
While it's cute, and sometimes helpful, to align columns in files that are manually maintained, this:
1) Has no benefit to localization done via BetaWiki
2) Obscures the actual changes made in a commit when the number of columns gets bumped, making overall code maintenance more difficult.
-- brion
Can be done. Will discuss with Nikerabbit tomorrow. He just left for bed. This means that 'human readable i18n' will be a thing of the past. To us it was already that, anyway :)
Would you suggest a rebuild of all localisation files in one go to get rid of your observed obscurity, or would you like it to trickle through over time?
I suggest we make the proposed changes only after the 1.13 release (read: release, not branching, so that we can do a backport to 1.13 just before release with the current code base).
Cheers! Siebrand
-----Oorspronkelijk bericht----- Van: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] Namens Brion Vibber Verzonden: dinsdag 15 juli 2008 22:41 Aan: Wikimedia developers Onderwerp: [Wikitech-l] On localization file formatting
I think it would greatly simplify change tracking if we used more consistent spacing in the localization files, eg changing this:
+ 'Search' => array( 'Sichen' ), + 'Resetpass' => array( 'Passwuert zrécksetzen' ), to this:
+ 'Search' => array( 'Sichen' ), + 'Resetpass' => array( 'Passwuert zrécksetzen' ),
While it's cute, and sometimes helpful, to align columns in files that are manually maintained, this:
1) Has no benefit to localization done via BetaWiki
2) Obscures the actual changes made in a commit when the number of columns gets bumped, making overall code maintenance more difficult.
{resent about an hours after first try) --- Can be done. Will discuss with Nikerabbit tomorrow. He just left for bed. This means that 'human readable i18n' will be a thing of the past. To us it was already that, anyway :)
Would you suggest a rebuild of all localisation files in one go to get rid of your observed obscurity, or would you like it to trickle through over time?
I suggest we make the proposed changes only after the 1.13 release (read: release, not branching, so that we can do a backport to 1.13 just before release with the current code base).
Cheers! Siebrand
-----Oorspronkelijk bericht----- Van: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] Namens Brion Vibber Verzonden: dinsdag 15 juli 2008 22:41 Aan: Wikimedia developers Onderwerp: [Wikitech-l] On localization file formatting
I think it would greatly simplify change tracking if we used more consistent spacing in the localization files, eg changing this:
+ 'Search' => array( 'Sichen' ), + 'Resetpass' => array( 'Passwuert zrécksetzen' ), to this:
+ 'Search' => array( 'Sichen' ), + 'Resetpass' => array( 'Passwuert zrécksetzen' ),
While it's cute, and sometimes helpful, to align columns in files that are manually maintained, this:
1) Has no benefit to localization done via BetaWiki
2) Obscures the actual changes made in a commit when the number of columns gets bumped, making overall code maintenance more difficult.
On Tue, Jul 15, 2008 at 4:48 PM, Siebrand Mazeland s.mazeland@xs4all.nl wrote:
This means that 'human readable i18n' will be a thing of the past.
That seems a bit overly dramatic, I must say. It's not like it would make it more than a bit less readable.
Would you suggest a rebuild of all localisation files in one go to get rid of your observed obscurity, or would you like it to trickle through over time?
It may as well be done all at once. Otherwise you're still obscuring at least one extra change for each message block, and the less-updated files will be inconsistent with the more-updated ones.
On Tue, Jul 15, 2008 at 5:43 PM, Siebrand Mazeland s.mazeland@xs4all.nl wrote:
{resent about an hours after first try)
It has worked every time (3) that you have tried it.
On 15/07/2008, Brion Vibber brion@wikimedia.org wrote:
Lately I've seen a lot of diffs to .i18n files that look like this:
- (50 lines with one number of spaces in them)
- (51 lines with a different number of spaces in them, and one of the
lines is new)
I've been long doing review using whitespace insensitive diffs. The commit emails are just unsuitable for doing i18n review, and they are always truncated anyways.
I think it would greatly simplify change tracking if we used more consistent spacing in the localization files, eg changing this:
'Search' => array( 'Sichen' ),
'Resetpass' => array( 'Passwuert zrécksetzen' ),
to this:
'Search' => array( 'Sichen' ),
'Resetpass' => array( 'Passwuert zrécksetzen' ),
That should only happen when a long alias in English is added or it had inconsistent spacing to begin with. For message arrays the problem is much bigger.
While it's cute, and sometimes helpful, to align columns in files that are manually maintained, this:
Has no benefit to localization done via BetaWiki
Obscures the actual changes made in a commit when the number of
columns gets bumped, making overall code maintenance more difficult.
-- brion
So, do you want remove padding from message arrays (in which case the code could be simplified a lot if we removed those comments too), magic words and aliases or only some subset of them?
Brion Vibber wrote:
Lately I've seen a lot of diffs to .i18n files that look like this:
- (50 lines with one number of spaces in them)
- (51 lines with a different number of spaces in them, and one of the
lines is new)
I think it would greatly simplify change tracking if we used more consistent spacing in the localization files, eg changing this:
- 'Search' => array( 'Sichen' ),
- 'Resetpass' => array( 'Passwuert zrécksetzen' ),
to this:
- 'Search' => array( 'Sichen' ),
- 'Resetpass' => array( 'Passwuert zrécksetzen' ),
While it's cute, and sometimes helpful, to align columns in files that are manually maintained, this:
Has no benefit to localization done via BetaWiki
Obscures the actual changes made in a commit when the number of
columns gets bumped, making overall code maintenance more difficult.
-- brion
I suggest not to remove the spacing in language files (everywhere). After trying to rebuild a language file without spaces, I can see it is much harder to read and maintain. While some of the localization is done via BetaWiki, I and several other users update their languages using direct SVN changes, which is faster, and also easier for some users. This change will make these updates harder, and will discourage users from updating the localization that way. It will also make it harder to read the language files or to find problems in them.
About reviewing localization updates: As Nikerabbit said, localization updates (especially those from BetaWiki) are usually truncated anyway in mails from mediawiki-cvs, and using diff tools that ignore whitespace changes (e.g. "svn diff -x -b", or ViewVC) may be used for reviewing localization changes.
However, there is a problem when many messages are aligned in the same way. The problem exists in extensions (e.g. CentralAuth, which contains many messages), whose translations are not divided into groups. I suggest that the extensions rebuilding script will support dividing messages into groups and make it possible to avoid huge whitespace changes when a longer message key is added.
Rotem Liss
On 16/07/2008, Rotem Liss rotemliss@gmail.com wrote:
I'd like to toss in some of my own opinions. On-topic discussion at the bottom.
I suggest not to remove the spacing in language files (everywhere). After trying to rebuild a language file without spaces, I can see it is much harder to read and maintain.
True.
While some of the localization is done via BetaWiki,
Most of it, we have hundreds of translators.
I and several other users update their languages using direct SVN changes, which is faster, and also easier for some users.
You, Raymond, Shinjiman? Looks like Alefzet has disappeared.
Here are few reasons why I think using Betawiki is better:
1) It prevents collaboration. It is not possible to work without and with Betawiki at the same time, because we currently can't handle conflicts very well. This may be alleviated once we can import external changes, but the issue itself stays. There is no other central place to communicate with other translators.
2) Nice features. Those who work in Betawiki have nice features, like better message format checks, nice statistics [1], all special page aliases in one place, no need to commit, easy overview of untranslated messages and those listed as problematic by the checks.
Then there is the message documentation, and those who contribute to it are helping everybody else using Betawiki.
And what happens when SVN committers get tired of updating the localisations? What will happen now that Alefzet is gone? Will we find new translators, or is the localisation doomed to bit rotting?
We do not tell people how they must update translations, but we do however, suggest and wish they use Betawiki, for the obvious reasons. I understand those who work with multiple script variants, it is not easy to do in Betawiki.
I understand you too, and why you feel using SVN is easier. It is a trade-off. Certain things are harder to do in Betawiki, even if we are constantly trying to fix those issues. But this comes down into manpower, and simply the people working with Betawiki have too many things to do. We are looking for new faces to handle some tasks, to leave us more time to work on the hard issues.
This change will make these updates harder, and will discourage users from updating the localization that way.
True, but my opinions on this are mixed.
It will also make it harder to read the language files or to find problems in them.
True.
About reviewing localization updates: As Nikerabbit said, localization updates (especially those from BetaWiki) are usually truncated anyway in mails from mediawiki-cvs, and using diff tools that ignore whitespace changes (e.g. "svn diff -x -b", or ViewVC) may be used for reviewing localization changes.
Of course those mails could get shorter, if there were less stuff caused by the whitespace changes, but I don't think that would have any noticeable help.
However, there is a problem when many messages are aligned in the same way. The problem exists in extensions (e.g. CentralAuth, which contains many messages), whose translations are not divided into groups. I suggest that the extensions rebuilding script will support dividing messages into groups and make it possible to avoid huge whitespace changes when a longer message key is added.
Maybe. This is exactly what we are discussing here. I agree that it doesn't make sense to align those all by the longest one.
I think we have now following alternatives: 1) remove padding 2) do nothing 3) use a constant pad 4) pad by the longest key in definitions, in which case whitespace only changes when long keys in English are added or removed 4b) same as above, but allow smaller blocks in extension messages
Rotem Liss
Not specifically targeted to you.
Niklas Laxström wrote:
I and several other users update their languages using direct SVN changes, which is faster, and also easier for some users.
You, Raymond, Shinjiman? Looks like Alefzet has disappeared.
Also Huji, and formerly Mfarag. Nevertheless, some translators leave and other join the effort, both in BetaWiki and in the SVN translation.
However, there is a problem when many messages are aligned in the same way. The problem exists in extensions (e.g. CentralAuth, which contains many messages), whose translations are not divided into groups. I suggest that the extensions rebuilding script will support dividing messages into groups and make it possible to avoid huge whitespace changes when a longer message key is added.
Maybe. This is exactly what we are discussing here. I agree that it doesn't make sense to align those all by the longest one.
I think we have now following alternatives:
- remove padding
- do nothing
- use a constant pad
- pad by the longest key in definitions, in which case whitespace
only changes when long keys in English are added or removed 4b) same as above, but allow smaller blocks in extension messages
Well, messages are aligned according to groups in core, and also in the English messages in extensions. However, there are no groups in translations of extensions, because of the rebuilding script. I think that it is an obvious move to group the translated messages of extensions, if the rebuilding script can do that.
On Wed, Jul 16, 2008 at 12:25 PM, Rotem Liss rotemliss@gmail.com wrote:
Niklas Laxström wrote:
I and several other users update their languages using direct SVN changes, which is faster, and also easier for some users.
You, Raymond, Shinjiman? Looks like Alefzet has disappeared.
Also Huji, and formerly Mfarag. Nevertheless, some translators leave and other join the effort, both in BetaWiki and in the SVN translation.
I think the major issue is that generally it is only possible for one person to contribute translations to a language using SVN, given the coordination issues and the volatility of the message translations. With Betawiki multiple users can translate the same language, which means that is one leaves there are still others contributing to that language - this is not the case for translations done directly in SVN.
MInuteElectron.
The point about using whitespace-insensitive diffs is a fairly obvious one that I overlooked. This is especially reasonable if ViewVC does those by default. Of course, occasionally whitespace *is* important, and usually it's not really distracting, so I'm not sure I'm going to be switching to svn diff -x -b anytime soon for most commits. In particular, whitespace is important when you're trying to get a patch file. Brion used to copy-paste the commit messages into patch -r to revert, if I'm not mistaken (although I hope by now he's upgraded to advanced tech like svn merge -c -rxxxxx ;) ).
On Wed, Jul 16, 2008 at 8:30 AM, Minute Electron minuteelectron@googlemail.com wrote:
I think the major issue is that generally it is only possible for one person to contribute translations to a language using SVN, given the coordination issues and the volatility of the message translations. With Betawiki multiple users can translate the same language, which means that is one leaves there are still others contributing to that language - this is not the case for translations done directly in SVN.
Um, it's not? I'm fairly sure that at some points we've had multiple people doing German translation, at least. You can just see which messages are still untranslated and only do those.
wikitech-l@lists.wikimedia.org