Mediawiki software localisation -- Pootle

List overview All Threads
Download

newer

older

Inserting a user from an outside...

any plans to run on PHP 5?

Mark Williamson

23 Aug 2005 23 Aug '05

11:05 a.m.

Hi all,

I received an e-mail from Dwayne Bailey, coordinator of translate.org.za (free software localisation group for South African languages), in which he showed interest in the idea of assisting with MediaWiki localisation in South African languages.

He also suggested using Pootle, about which you can find more at http://translate.sourceforge.net/ .

There have been a couple of mails on this list I believe about encouraging localisation somewhat independently of projects, so that for example a Ladino interface translation could be completed without the existance of a Ladino Wikipedia or other Wikimedia project. Of course, this is already possible, but it's unlikely to happen. Within that framework, it could become a prelaunch condition for requests for new language versions of Wikimedia projects, and would also allow for more collaborative translation rather than the current system where most system messages are translated by one or two sysops in the MediaWiki namespace, while regular users look on without the ability to make corrections.

While the current preferred method of localisation appears to be to use the MediaWiki namespace, this doesn't work well with the recently-introduced ability to choose ones' own interface language in preferences: if I choose on en.wiki to view the interface in, say, Navajo, or Amharic, or Bengali, nothing shows up because most or all of the translations for these languages were made in the MediaWiki namespace.

While this may not seem like a major concern, I think that there are more than a few editors on the major language versions who speak that language as their second language and might prefer to view the user interface in their native language. This may still not seem like a major concern because, after all, don't all of the "major languages" of the world have full translations of language.php? Unfortunately, this is not the case, and is limited almost completely to the languages of Europe, with few exceptions. Languages such as Bengali, Amharic, Telugu, Fulfulde, Armenian, are all major world languages with millions of speakers (Bengali, for example, is the national language of Bangladesh, and a regional language of India, two of the most populous nations on Earth, for both of which English is a relatively common second language), each of them has a widely-translated interface in the MediaWiki namespace, but none of them has a LanguageXX.php file, or if they do, it has few or no system messages.

Mark

---------- Forwarded message ---------- From: Dwayne Bailey <______@______.___> Date: 23-Aug-2005 03:06 Subject: Re: Mediawiki software localisation To: Mark Williamson node.ue@gmail.com

...

This is a great idea, I will investigate how we can translate this. Do you think you guys would be interested in using Pootle (http://pootle.wordforge.org) so that other languages can easily translate?

...

Show replies by date

Ævar Arnfjörð Bjarmason

25 Aug 25 Aug

1:53 p.m.

This appears to be just for projects that use gettext, openoffice and mozilla formats for their interface messages, we use a custom system, could this be used for that? (I only gave it a superficial look)

Sabine Cretella

2 p.m.

Where can I find the "user interface" as is to be translated?

I know there was a link, but I cannot find it anymore. I'd like to try some filters on it so I can tell you which one can work.

As for Pootle - as for conicidence during the last days I was having a look at this project since I am trying to understand how the .po files are created and if these can be translated easily with OmegaT or better: how the real ressource files of the softwares are to find out if OmegaT (an OpenSource CAT-Tool I would like to be used also for wiki contents translation) can be used for that as is or if we need further parsers.

Ciao, Sabine

Ævar Arnfjörð Bjarmason wrote:

...

This appears to be just for projects that use gettext, openoffice and mozilla formats for their interface messages, we use a custom system, could this be used for that? (I only gave it a superficial look) _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

___________________________________ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.beta.messenger.yahoo.com

Rowan Collins

26 Aug 26 Aug

9:58 a.m.

On 25/08/05, Sabine Cretella sabine_cretella@yahoo.it wrote:

...

Where can I find the "user interface" as is to be translated?

MediaWiki's user interface is stored: * in the MediaWiki: namespace of a running wiki; browsable, and exportable, via Special:Allmessages * in PHP files in the 'languages' directory of the source, which define array[s] containing the content to be translated; these are used to create the MediaWiki: namespace at time of install, and to support alternative interfaces on one wiki.

There are no .po files, or any other standard format, involved or available in the current system.

HTH

-- Rowan Collins BSc [IMSoP]

Ashar Voultoiz

28 Aug 28 Aug

10:14 a.m.

Rowan Collins wrote:

...

There are no .po files, or any other standard format, involved or available in the current system.

There is maintenance/lang2po.php that is a wrapper around xgettext and msgmerge to generate some .po files. The file generated are far from being complete though :o)

-- Ashar Voultoiz - WP++++ http://en.wikipedia.org/wiki/User:Hashar http://www.livejournal.com/community/wikitech/ IM: hashar@jabber.org ICQ: 15325080

Rowan Collins

26 Aug 26 Aug

10:03 a.m.

On 23/08/05, Mark Williamson node.ue@gmail.com wrote:

...

While the current preferred method of localisation appears to be to use the MediaWiki namespace, this doesn't work well with the recently-introduced ability to choose ones' own interface language in preferences: if I choose on en.wiki to view the interface in, say, Navajo, or Amharic, or Bengali, nothing shows up because most or all of the translations for these languages were made in the MediaWiki namespace.

I have wondered before whether a specific, site-neutral, system for collabourating on translations would be useful - not only for new languages, but to keep existing translations up to date when the English "master" messages are changed, or new features are added.

However, languages which *do* have available translations live in a MediaWiki namespace can fairly trivially have them exported to an appropriate PHP file, after which they will also be available at install time, and as a user preference. I was pondering a while ago how to merge exported messages into existing language files which aren't in alphabetical order, and preserve comments, etc - a case of manipulating the diffs produced by 'maintenance/diffLanguage.php'...

-- Rowan Collins BSc [IMSoP]

Sabine Cretella

3:46 p.m.

...

I have wondered before whether a specific, site-neutral, system for collabourating on translations would be useful - not only for new languages, but to keep existing translations up to date when the English "master" messages are changed, or new features are added.

Hi Rowan, I really thought about trying something like this with a wiki and of course with UW (at a certain stage).

I have some ideas on how it could work, but within a normal wiki it means loads of templates ...

The easiest thing is to use a CAT-Tool to translate the UI and then put the translation memory (tmx) at disposal - so anyone who does an update will have 100% matches from the old UI and the rest needs to be translated. This way the translator goes also over the correction of eventual misspellings or changes in terminology are possible in order to have an even better product.

Of course this can also be done with .po files, but I would very much prefer to work directly on the source files (maybe since I am used to it and I don't feel familiar with .po). As for CAT-Tools all is about having the right parser for the CAT-Tool. I suppose your php-files are similar to the ones I already translated and so, for example, the ordinary html parser of OmegaT (http://www.omegat.org) should work.

This means I just need the php file and we will know if it works.

This is the easiest way for us translators to work with that.

Can you pass me the php file of the actual version? + for example EN+DE or EN+IT from an older localised version? This way I can also try to create an alignment file to have a basis to work on.

I would very much like to see OmegaT used for these things as we are planning a reference implementation that connects it to UW and there we have the possibility to store the TM (translation memory) with the appropriate category. This would then mean; OmegaT users can retrieve the needed information for the UI localisation from Ultimate Wiktionary.

Ciao, Sabine

___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it

Rowan Collins

4:58 p.m.

On 26/08/05, Sabine Cretella sabine_cretella@yahoo.it wrote:

...

The easiest thing is to use a CAT-Tool to translate the UI and then put the translation memory (tmx) at disposal - so anyone who does an update will have 100% matches from the old UI and the rest needs to be translated. This way the translator goes also over the correction of eventual misspellings or changes in terminology are possible in order to have an even better product.

One of the big problems with the current setup is that the MediaWiki: namespace (the "live" messages in the database of a particular project) are used both for localisation and customisation - so whenever you export the messages from, say, Wikipedia, you have to work out which changes are due to changes in the software, which are cosmetic but appropriate for application to other projects, and which are specific to the particular project. This is probably the biggest challenge which any new l10n system (or even a new approach to i18n) needs to address.

...

Can you pass me the php file of the actual version? + for example EN+DE or EN+IT from an older localised version? This way I can also try to create an alignment file to have a basis to work on.

The PHP files are, naturally, in the source of the software - see http://www.mediawiki.org/wiki/Download The easiest way is to get them out of the web-based CVS interface: * http://cvs.sourceforge.net/viewcvs.py/wikipedia/phase3/languages/ * the language codes are the same ones Wikimedia domains use; the English interface, which is also the default for missing messages in other languages, is in "Language.php" * so, for German: http://cvs.sourceforge.net/viewcvs.py/*checkout*/wikipedia/phase3/languages/...

Also, if you haven't already, have a look at the documentation on meta: * http://meta.wikimedia.org/wiki/Category:Localisation * http://meta.wikimedia.org/wiki/MediaWiki_localisation * http://meta.wikimedia.org/wiki/Help:MediaWiki_namespace etc

-- Rowan Collins BSc [IMSoP]

Nikola Smolenski

28 Aug 28 Aug

2:47 a.m.

On Friday 26 August 2005 21:58, Rowan Collins wrote:

...

On 26/08/05, Sabine Cretella sabine_cretella@yahoo.it wrote:

...
The easiest thing is to use a CAT-Tool to translate the UI and then put the translation memory (tmx) at disposal - so anyone who does an update will have 100% matches from the old UI and the rest needs to be translated. This way the translator goes also over the correction of eventual misspellings or changes in terminology are possible in order to have an even better product.

One of the big problems with the current setup is that the MediaWiki: namespace (the "live" messages in the database of a particular project) are used both for localisation and customisation - so whenever you export the messages from, say, Wikipedia, you have to work out which changes are due to changes in the software, which are cosmetic but appropriate for application to other projects, and which are specific to the particular project. This is probably the biggest challenge which any new l10n system (or even a new approach to i18n) needs to address.

Maybe it would be possible to make a list of strings which are expected to be customised, and which are not. For example, "From Wikipedia, the free encyclopedia" will always be customised while "Article", "Discussion", "Edit", "Recent Changes"... will never be. Then, LanguageXx.php could periodically be refreshed with only "safe" strings filled in. Language files for Navajo and other languages which only have translations in MediaWiki namespace could be created from scratch in this way.

Ashar Voultoiz

10:26 a.m.

Nikola Smolenski wrote:

...

Maybe it would be possible to make a list of strings which are

expected to be

...

customised, and which are not. For example, "From Wikipedia, the free encyclopedia" will always be customised while "Article", "Discussion", "Edit", "Recent Changes"... will never be. Then, LanguageXx.php could periodically be refreshed with only "safe" strings filled in. Language files for Navajo and other languages which only have translations in MediaWiki namespace could be created from scratch in this way.

Splitting the strings can probably easily done as we have two functions for messages: - wfMsg : a message in whatever language user selected. - wfMsgForContent() : a message using site language.

wfMsg is more or less the UI messages and can probably be removed from the MediaWiki namespaces. Then we will just have to manage the Content messages ;)

-- Ashar Voultoiz - WP++++ http://en.wikipedia.org/wiki/User:Hashar http://www.livejournal.com/community/wikitech/ IM: hashar@jabber.org ICQ: 15325080

Sabine Cretella

11:47 a.m.

...

Splitting the strings can probably easily done as we have two functions for messages:

wfMsg : a message in whatever language user selected.

wfMsgForContent() : a message using site language.

wfMsg is more or less the UI messages and can probably be removed from the MediaWiki namespaces. Then we will just have to manage the Content messages ;)

Following this: if these strings are always the same - that is to say they are always separated by <space>:<space> it is easy to write a parser - so you do not need to separate since it can be done within the CAT-Tool. That's also why I am asking for sample files to try them out - to see what happens if I work with them using the actual parsers. Actually the parser for the UI of OmegaT has a = as separator - I don't know how easy it is to adapt this to the format we need for mediawiki software, but as I said I would like to try it out.

Ciao, Sabine

___________________________________ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.beta.messenger.yahoo.com

Rowan Collins

1:32 p.m.

On 28/08/05, Nikola Smolenski smolensk@eunet.yu wrote:

...

Maybe it would be possible to make a list of strings which are expected to be customised, and which are not. For example, "From Wikipedia, the free encyclopedia" will always be customised while "Article", "Discussion", "Edit", "Recent Changes"... will never be. Then, LanguageXx.php could periodically be refreshed with only "safe" strings filled in. Language files for Navajo and other languages which only have translations in MediaWiki namespace could be created from scratch in this way.

Nice idea, but I'm not sure such a distinction can really be made - projects may have all sorts of reason for inserting particular jargon, links, or just plain style in various parts of the interface, and it seems a shame to deny them the right to do this. In fact, MediaWiki:edit is a case in point - the English default is "edit", but the English Wikipedia uses "edit this page", because lots of descriptions elsewhere refer to "clicking the edit this page link".

I think a better idea would be to have a collaborative tool for translations (be it wiki-style or anything else) which was able to compare current default messages with custom messages from a particular wiki, and allow users (who can understand the language in question) to manually "merge" the changes which are appropriate for the distributed default. Ideally, there should also be a way of flagging which messages have been changed in the English "master" version and/or a way of comparing with a "parent" language (so, for instance, Catalan might be worth comparing against both Castillian Spanish and English).

I haven't time right now to see if any existing tools could be adapted for this purpose - tempting though a wiki-based system seems, a permissive setup of an existing l10n tool might be more suitable.

-- Rowan Collins BSc [IMSoP]

Sabine Cretella

1:50 p.m.

Rowan Collins wrote:

...

On 28/08/05, Nikola Smolenski smolensk@eunet.yu wrote:

...
Maybe it would be possible to make a list of strings which are expected to be customised, and which are not. For example, "From Wikipedia, the free encyclopedia" will always be customised while "Article", "Discussion", "Edit", "Recent Changes"... will never be. Then, LanguageXx.php could periodically be refreshed with only "safe" strings filled in. Language files for Navajo and other languages which only have translations in MediaWiki namespace could be created from scratch in this way.

Nice idea, but I'm not sure such a distinction can really be made - projects may have all sorts of reason for inserting particular jargon, links, or just plain style in various parts of the interface, and it seems a shame to deny them the right to do this. In fact, MediaWiki:edit is a case in point - the English default is "edit", but the English Wikipedia uses "edit this page", because lots of descriptions elsewhere refer to "clicking the edit this page link".

I think a better idea would be to have a collaborative tool for translations (be it wiki-style or anything else) which was able to compare current default messages with custom messages from a particular wiki, and allow users (who can understand the language in question) to manually "merge" the changes which are appropriate for the distributed default. Ideally, there should also be a way of flagging which messages have been changed in the English "master" version and/or a way of comparing with a "parent" language (so, for instance, Catalan might be worth comparing against both Castillian Spanish and English).

I haven't time right now to see if any existing tools could be adapted for this purpose - tempting though a wiki-based system seems, a permissive setup of an existing l10n tool might be more suitable.

Again: OmegaT (and if possible also other tools) will be integrated in UW (we are already talking about the reference implementation) - we can have a tmx management at a certain stage - please base this stuff on TMX since this means almost all localisation tools on Lista-Standard can be used for localisation (also commercial tools like DéjàVu - http://www.altril.com - but of course I prefer Open Source tools) - so please have a look at this and send me sample files to try out what the actual OmegaT-version does and what needs to be changed (I suppose it is fairly easy to adapt the file parser).

In a tmx it is not said that the source and target must be 100% the same text - it can also be a "basic contents" - "special contents" thingie. Really it is hard to explain this.

let's say in source you have "abc" in English and in target you have "abc" + further text that is wanted by the local project in Italian

whenever there is a new version to translate the sofware will then give you the inserted translation pair source/target as 100% match or if wording was slightly changed as partial match.

is that understandable? If not please tell me - so I will try to explain better. It is just like we sometimes find it in "Allmessages"

For further info on OmegaT: http://www.omegat.org/omegat/omegat.html

For further info on tmx + tbx: http://www.lisa.org

I know that this is not the ordinary wiki-tool and I know that it might seem strange to whoever is not a translator used to CAT-Tools, but it really is a great tool built for localisation/translation work.

Ciao, Sabine

___________________________________ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.beta.messenger.yahoo.com

Rowan Collins

2:17 p.m.

On 28/08/05, Sabine Cretella sabine_cretella@yahoo.it wrote:

...

Again: OmegaT (and if possible also other tools) will be integrated in UW (we are already talking about the reference implementation) - we can have a tmx management at a certain stage - please base this stuff on TMX since this means almost all localisation tools on Lista-Standard can be used for localisation (also commercial tools like DéjàVu - http://www.altril.com - but of course I prefer Open Source tools) - so please have a look at this and send me sample files to try out what the actual OmegaT-version does and what needs to be changed (I suppose it is fairly easy to adapt the file parser).

I will have a look at the tools and resources you've mentionned if and when I get time.

As for example files, please see the links in my previous post (http://mail.wikipedia.org/pipermail/wikitech-l/2005-August/031182.html). Specifically, you can get any of the files by using URLs like http://cvs.sourceforge.net/viewcvs.py/*checkout*/wikipedia/phase3/languages/... (replacing the "LanguageDe.php" bit for German with "LanguageFr.php" for French, etc, like the wiki domain prefixes, or just "Language.php" for English).

[...]

...

whenever there is a new version to translate the sofware will then give you the inserted translation pair source/target as 100% match or if wording was slightly changed as partial match.

That sounds like it could be the right kind of thing - after all, I doubt MediaWiki is the first to face this conflict between i18n and customization (c11n?), so there *ought* to be tools out there that attempt to deal with it to some extent.

...

I know that this is not the ordinary wiki-tool and I know that it might seem strange to whoever is not a translator used to CAT-Tools, but it really is a great tool built for localisation/translation work.

Like I say, a wiki-like tool is tempting, but is probably rather like "reinventing the wheel" compared to using an existing specialist tool. [And we can always customise it to *look* like MediaWiki - we've got "MonoBook" skins for Bugzilla and even LiveJournal already, after all ;p]

-- Rowan Collins BSc [IMSoP]

Sabine Cretella

3:15 p.m.

Hi Rowan,

as for now I can only work on it in substituting .php with .txt :-( with an older version I was able to force the software to open files with a different parser, even if the format was not 100% the same.

So here are the results :-)

Source file (I cut part of it):

************************

<?php /** * @package MediaWiki * @subpackage Language */

if( defined( 'MEDIAWIKI' ) ) {

# # In general you should not make customizations in these language files # directly, but should use the MediaWiki: special namespace to customize # user interface messages through the wiki. # See http://meta.wikipedia.org/wiki/MediaWiki_namespace # # NOTE TO TRANSLATORS: Do not copy this whole file when making translations! # A lot of common constants and a base class with inheritable methods are # defined here, which should not be redefined. See the other LanguageXx.php # files for examples. #

#-------------------------------------------------------------------------- # Language-specific text #--------------------------------------------------------------------------

if($wgMetaNamespace === FALSE) $wgMetaNamespace = str_replace( ' ', '_', $wgSitename );

/* private */ $wgNamespaceNamesEn = array( NS_MEDIA => 'Media', NS_SPECIAL => 'Special', NS_MAIN => '', NS_TALK => 'Talk', NS_USER => 'User', NS_USER_TALK => 'User_talk', **************************************

Target file

**************************************

<?php /** German (Deutsch) * @package MediaWiki * @subpackage Language */

if( defined( 'MEDIAWIKI' ) ) {

#-------------------------------------------------------------------------- # Language-specific text #--------------------------------------------------------------------------

if($wgMetaNamespace === FALSE) $wgMetaNamespace = str_replace( ' ', '_', $wgSitename );

/* private */ $wgNamespaceNamesEn = array( NS_MEDIA => 'Media', NS_SPECIAL => 'Spezial', NS_MAIN => '', NS_TALK => 'Diskussion', NS_USER => 'Benutzer', NS_USER_TALK => 'Benutzer_Diskussion', NS_PROJECT => $wgMetaNamespace,

*************************************

resulting TMX file

************************************

- <tmx version="1.1"> <header creationtool="OmegaT" creationtoolversion="1" segtype="paragraph" o-tmf="OmegaT TMX" adminlang="EN-US" srclang="EN-EN" datatype="plaintext"> </header> - <body> - <tu> - <tuv lang="EN-EN"> <seg>/**</seg> </tuv> - <tuv lang="DE-DE"> <seg>/** German (Deutsch)</seg> </tuv> </tu> - <tu> - <tuv lang="EN-EN"> <seg> NS_SPECIAL => 'Special',</seg> </tuv> - <tuv lang="DE-DE"> <seg> NS_SPECIAL => 'Spezial',</seg> </tuv> </tu> - <tu> - <tuv lang="EN-EN"> <seg> NS_TALK => 'Talk',</seg> </tuv> - <tuv lang="DE-DE"> <seg> NS_TALK => 'Diskussion',</seg> </tuv> </tu> - <tu> - <tuv lang="EN-EN"> <seg> NS_USER => 'User',</seg> </tuv> - <tuv lang="DE-DE"> <seg> NS_USER => 'Benutzer',</seg> </tuv> </tu> - <tu> - <tuv lang="EN-EN"> <seg> NS_USER_TALK => 'User_talk',</seg> </tuv> - <tuv lang="DE-DE"> <seg> NS_USER_TALK => 'Benutzer_Diskussion',</seg> </tuv> </tu> </body> </tmx>

**************************************

Now the problem here is the parser in order to exclude the first part of the translation unit and the comments that should not be touched: example:

NS_USER => NS_USER_TALK =>

should not be seen

and these parts should not be seen

************************************

If we are talking about people who know what they may not touch, the tool can already be used as is. If necessary I can try to create alignment files and store to tmx using DéjàVu - there is also a OpenSource alignment tool, but I never used it up to now since I have that commercial tool as well. Of course: we can try it out.

I hope this helps to understand what I am talking about.

Now considering wikidata/Ultimate Wiktionary: it can be a repository for localization data - it just needs correct attribution (in wiki language: the correct category) - this is not possibile immediately but we are already thinking about this since it is "only" an interface issue between OmegaT and UW (only is in " since I don't know how easy this is - my personal programming experiences date back to basic, turbo pascal 5/6 and dbase III+ and IV and even there only easy stuff ... so it's many years now I did not do anything, but at least I can imagine how things could work).

For the parts above I changed the .php extension to .txt extension, created the project, translated some lines and then exported to target and saved translation memory. It is not difficult but for now only people who know what they may not touch can work on it.

If we have the interface between UW and OmegaT one can also make changes on UW that can then be reflected during the following translation. Or one can add translations directly in UW so more than one person contemporarily can work on the project (proof reading also is easier) and when there's the new version you just recall the tmx you need from UW and have the file translated adding the new segments to the TMX and UW.

I am working with version 1.4.6 beta2 since there seems to be a bug with the matches in the latest version and Maxym is not yet back from holiday - so if someone wants to try the software keep that in mind. If you have questions: you can contact me directly or we can use a wiki (don't know where on the wikimedia projects, please tell me), and of course there's also the OmegaT mailing list (omegat@yahoogroups.com).

Should I add this information somewhere? Where? Meta?

Ciao, Sabine

___________________________________ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.beta.messenger.yahoo.com

Sabine Cretella

2 Sep 2 Sep

10:39 a.m.

New subject: Mediawiki software localisation -- OmegaT

Hi Rowan,

well, I think the Neapolitan wikipedia is not too many steps away ... so I would like to start to translate the UI using OmegaT just to create an example. How is your feeling about this? At least we will have a "real example" for the localisation using a CAT-Tool.

This will need some time since I will need to confirm every word with our Neapolitan discussion group - I write some Neapolitan, but I am not a native speaker - so it will be a collaborative work. Or should we do a really particular thingie: use a wiki ... (additionally to the CAT-Tool). I could do it on the Italian Wiktionary using the template {{-mediawiki-l8n-}} and all the templates we normally use for languages etc. so this would also be the very first project being "translated" this way and it will immediately also be coded correctly in order to be transferred into UW.

So it would not disturb on en.wiktionary where many would be contrary to such a particular tentative and anyone can edit since once you are logged in you can change UI-language to whatever you prefer.

Well, let me know your feelings about this.

Ciao, Sabine

___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it

Nikola Smolenski

29 Aug 29 Aug

4:43 p.m.

On Sunday 28 August 2005 18:32, Rowan Collins wrote:

...

On 28/08/05, Nikola Smolenski smolensk@eunet.yu wrote:

...
Maybe it would be possible to make a list of strings which are expected to be customised, and which are not. For example, "From Wikipedia, the free encyclopedia" will always be customised while "Article", "Discussion", "Edit", "Recent Changes"... will never be. Then, LanguageXx.php could periodically be refreshed with only "safe" strings filled in. Language files for Navajo and other languages which only have translations in MediaWiki namespace could be created from scratch in this way.

Nice idea, but I'm not sure such a distinction can really be made - projects may have all sorts of reason for inserting particular jargon, links, or just plain style in various parts of the interface, and it seems a shame to deny them the right to do this. In fact, MediaWiki:edit is a case in point - the English default is "edit", but the English Wikipedia uses "edit this page", because lots of descriptions elsewhere refer to "clicking the edit this page link".

Well, that's not really custom - other wikis might well use it and it will still be understandable. Either way, it could only be used for initial building of language files, and then for suggesting changes, nothing needs to be done automatically.

...

I think a better idea would be to have a collaborative tool for translations (be it wiki-style or anything else) which was able to compare current default messages with custom messages from a particular wiki, and allow users (who can understand the language in question) to manually "merge" the changes which are appropriate for the distributed default. Ideally, there should also be a way of flagging which messages have been changed in the English "master" version and/or a way of comparing with a "parent" language (so, for instance, Catalan might be worth comparing against both Castillian Spanish and English).

I haven't time right now to see if any existing tools could be adapted for this purpose - tempting though a wiki-based system seems, a permissive setup of an existing l10n tool might be more suitable.

I see two ways of doing this: either converting language files to something (be it TMX, .po or something else), editing them, and converting them back to .php, or converting MediaWiki to use gettext and .po files directly. The latter might even be faster for smaller installations (and availability of gettext is not a problem now that php-gettext exists) - should it be done?

Jamie Bliss

4:11 p.m.

(I have not been following this thread closely, so correct me if I am in error, as usual.)

Nikola Smolenski wrote:

...

I see two ways of doing this: either converting language files to something (be it TMX, .po or something else), editing them, and converting them back to .php, or converting MediaWiki to use gettext and .po files directly. The latter might even be faster for smaller installations (and availability of gettext is not a problem now that php-gettext exists) - should it be done?

After some basic research on php-gettext, I note there are several severe draw-backs. - It is reportedly not thread-safe; any Windows server using it is at-risk for serious errors - It apparently caches files and does not check for updates until you restart Apache - Because it is an extension, it should not be required

The latter two are especially important if the server is not controlled by the webmaster (ie, someone else hosts them). This could make it dificult for them to install an extension or restart Apache.

--Jamie

Sabine Cretella

4:26 p.m.

Nikola Smolenski wrote:

...

On Sunday 28 August 2005 18:32, Rowan Collins wrote:

...
On 28/08/05, Nikola Smolenski smolensk@eunet.yu wrote:

I think a better idea would be to have a collaborative tool for translations (be it wiki-style or anything else) which was able to compare current default messages with custom messages from a particular wiki, and allow users (who can understand the language in question) to manually "merge" the changes which are appropriate for the distributed default. Ideally, there should also be a way of flagging which messages have been changed in the English "master" version and/or a way of comparing with a "parent" language (so, for instance, Catalan might be worth comparing against both Castillian Spanish and English).

I haven't time right now to see if any existing tools could be adapted for this purpose - tempting though a wiki-based system seems, a permissive setup of an existing l10n tool might be more suitable.

I see two ways of doing this: either converting language files to something (be it TMX, .po or something else), editing them, and converting them back to .php, or converting MediaWiki to use gettext and .po files directly. The latter might even be faster for smaller installations (and availability of gettext is not a problem now that php-gettext exists) - should it be done?

Nicola, please have a look at the other mail I sent yesterday - php files do not need to be converted to anything - you just work directly on them and the tbx is created automatically.

So it is re-usable the next time you translate the update.

They still work with gettext to convert to .po files and the other way round, but it is not necessary anymore. Creating the right parser/filter you can work on whatever source file even c++.without needing to convert everything.

Ciao, Sabine

___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it

Gerard Meijssen

5:11 p.m.

Nikola Smolenski wrote:

...

I see two ways of doing this: either converting language files to something (be it TMX, .po or something else), editing them, and converting them back to .php, or converting MediaWiki to use gettext and .po files directly. The latter might even be faster for smaller installations (and availability of gettext is not a problem now that php-gettext exists) - should it be done?

Hoi, From your answer I understand that you do not apreciate what TMX is. TMX is a manner in which you store what is called a "translation memory", it allows you to make use of this translation memory when you have to redo this translation when changes have occurred or when similar content is to be translated. Consequently the use of TMX is applicable in tools for translators; in itself it is not a format that allows for editing.

Consequently, the idea that it is "faster" for smaller installations misses the point completely. This may be true when software and its user interface is set in stone while we all know that this is not the case.

Thanks, GerardM

Angela

26 Aug 26 Aug

11:34 p.m.

On 8/23/05, Mark Williamson node.ue@gmail.com wrote:

...

There have been a couple of mails on this list I believe about encouraging localisation somewhat independently of projects, so that for example a Ladino interface translation could be completed without the existance of a Ladino Wikipedia or other Wikimedia project.

If there was a way to implement the Interface translations wiki, translations would not be dependant on being an admin of any particular wiki.

See http://meta.wikimedia.org/wiki/Interface_translations_wiki

Angela.

7043

Age (days ago)

7053

Last active (days ago)

wikitech-l@lists.wikimedia.org

20 comments

9 participants

tags (0)

participants (9)

Angela
Ashar Voultoiz
Gerard Meijssen
Jamie Bliss
Mark Williamson
Nikola Smolenski
Rowan Collins
Sabine Cretella
Ævar Arnfjörð Bjarmason