The inclusion of MediaWiki:<meta> pages in the base Article, templates, etc. dumps, such as the English Wikipedia, cause major problems if you are trying to import these dumps into a Wikipedia enabled for another language. Many of the navigation menu's get trashed and corrupted when these MediaWiki labels for the English Wikipedia are included in the base article and template dumps. It certainly is ok to include them in the dumps which supposedly contains "everything" (the .7z dumps) but they shoud be stripped out of the base dumps which are supposed to just contain the articles and templates for the articles.
If you import one of these dumps without stipping out the MediaWiki: specific articles which add menu items, etc. the navigation bar gets messed up and things likfe "Recent Changes" no longer map properly in MedaiWiki.
I think its OK to leave some of the MediaWiki tags, like site notice and "contribute to Wikipedia" kind of stuff for GFDL and attribution, but the inclusion of menu pointers which affect menus should be stripped out as they will cause major problems on MediaWiki's enabled for another language.
Jeff
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jeff V. Merkey wrote:
The inclusion of MediaWiki:<meta> pages in the base Article, templates, etc. dumps, such as the English Wikipedia, cause major problems if you are trying to import these dumps into a Wikipedia enabled for another language. Many of the navigation menu's get trashed and corrupted when these MediaWiki labels for the English Wikipedia are included in the base article and template dumps.
Please describe "trashed and corrupted" by giving specific examples. Also try to describe what "enabled for another language" means.
Does it mean you have configured the wiki to a language different from the language you are importing? Can you explain why you would do this? It sounds like a simple configuration error on your part. The obvious result would be that both articles and customized messages will appear in the language you imported (eg English) instead of the language you set it to.
If you simply want to look at the menu in another language, you probably want to just change your user preference, not the wiki-wide content language.
It certainly is ok to include them in the dumps which supposedly contains "everything" (the .7z dumps) but they shoud be stripped out of the base dumps which are supposed to just contain the articles and templates for the articles.
Will consider making that change.
Note that you can exclude a namespace from your import using mwdumper by using the namespace filter, for instance:
--filter=namespace:!NS_MEDIAWIKI
(You can also use mwdumper to filter the XML and feed it into importDump.php or another tool to do the actual import, if you prefer using another tool.)
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
Brion Vibber wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jeff V. Merkey wrote:
The inclusion of MediaWiki:<meta> pages in the base Article, templates, etc. dumps, such as the English Wikipedia, cause major problems if you are trying to import these dumps into a Wikipedia enabled for another language. Many of the navigation menu's get trashed and corrupted when these MediaWiki labels for the English Wikipedia are included in the base article and template dumps.
Please describe "trashed and corrupted" by giving specific examples. Also try to describe what "enabled for another language" means.
From language/messages/MessagesChr.php
# Recent changes 'changes' => 'ᏗᎦᏁᏟᏴᏍᏗ', 'recentchanges' => 'ᎾᏞᎬ ᏗᎦᏁᏟᏴᏍᏗ', 'recentchanges-url' => 'ᎤᏤᎵᏛ:Recentchanges', 'recentchangestext' => 'ᎦᏅᏅᎢ ᎯᎠ ᎤᎪᏗᏗ ᎾᏞᎬ ᏗᎦᏁᏟᏴᏍᏗ ᎯᎠ wiki ᎾᎿ ᎪᎯ ᎤᏆᏓᏛ.',
"recentchanges-url" and other utrl tags end up mapping to their translated constructs as the internal representation. i.e. recentchanges-url ends up rendering as "ᎾᏞᎬ ᏗᎦᏁᏟᏴᏍᏗ-url" as the page link INTERNALLY rather than "recentchanges-url", even though the internal variable is named something else. This causes the page linking to "ᎤᏤᎵᏛ:Recentchanges" to end up pointing to a non-existent link. All of the -url variables end up getting corrupted this way.
Does it mean you have configured the wiki to a language different from the language you are importing?
Yes.
Can you explain why you would do this?
Translated Cherokee articles created from the English Wiki dumps.
It sounds like a simple configuration error on your part. The obvious result would be that both articles and customized messages will appear in the language you imported (eg English) instead of the language you set it to.
If you simply want to look at the menu in another language, you probably want to just change your user preference, not the wiki-wide content language.
Hmmm, not sure about this one, but I will investigate this one.
It certainly is ok to include them in the dumps which supposedly contains "everything" (the .7z dumps) but they shoud be stripped out of the base dumps which are supposed to just contain the articles and templates for the articles.
Will consider making that change.
Probably a good idea.
Note that you can exclude a namespace from your import using mwdumper by using the namespace filter, for instance:
--filter=namespace:!NS_MEDIAWIKI
(You can also use mwdumper to filter the XML and feed it into importDump.php or another tool to do the actual import, if you prefer using another tool.)
Thanks. I will update the database dumps page on meta with this information on how to strip out the MediaWiki namespace entries.
Jeff
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF+ZKDwRnhpk1wk44RArW0AJsEdkLliDh1ojn+hXMKyYjS+uOJmgCeLjWj WpHPz577YLDxODL1STIemsI= =ooPk -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jeff V. Merkey wrote:
# Recent changes 'changes' => 'צÁßôÍ×', 'recentchanges' => '¾Þ¬ צÁßôÍ×', 'recentchanges-url' => '¤äµÛ:Recentchanges', 'recentchangestext' => '¦ÅÅ¢ ¯ ¤ª×× ¾Þ¬ צÁßôÍ× ¯ wiki ¾¿ ª¯ ¤ÆÓÛ.',
"recentchanges-url" and other utrl tags end up mapping to their translated constructs as the internal representation. i.e. recentchanges-url ends up rendering as "¾Þ¬ צÁßôÍ×-url" as the page link INTERNALLY rather than "recentchanges-url", even though the internal variable is named something else. This causes the page linking to "¤äµÛ:Recentchanges" to end up pointing to a non-existent link. All of the -url variables end up getting corrupted this way.
It sounds to me like your problem is that you're running some sort of automated translation on the *contents* of MediaWiki:Sidebar as though it were article text, thus producing a useless, corrupt entry.
In other words, the problem is not that you imported MediaWiki:Sidebar; the problem is that you imported a version of MediaWiki:Sidebar which is meaningless. "Garbage in, garbage out." :)
Can you confirm?
I would recommend excluding such software configuration pages from your translation tool.
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
Brion Vibber wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jeff V. Merkey wrote:
# Recent changes 'changes' => 'צÁßôÍ×', 'recentchanges' => '¾Þ¬ צÁßôÍ×', 'recentchanges-url' => '¤äµÛ:Recentchanges', 'recentchangestext' => '¦ÅÅ¢ ¯ ¤ª×× ¾Þ¬ צÁßôÍ× ¯ wiki ¾¿ ª¯ ¤ÆÓÛ.',
"recentchanges-url" and other utrl tags end up mapping to their translated constructs as the internal representation. i.e. recentchanges-url ends up rendering as "¾Þ¬ צÁßôÍ×-url" as the page link INTERNALLY rather than "recentchanges-url", even though the internal variable is named something else. This causes the page linking to "¤äµÛ:Recentchanges" to end up pointing to a non-existent link. All of the -url variables end up getting corrupted this way.
It sounds to me like your problem is that you're running some sort of automated translation on the *contents* of MediaWiki:Sidebar as though it were article text, thus producing a useless, corrupt entry.
In other words, the problem is not that you imported MediaWiki:Sidebar; the problem is that you imported a version of MediaWiki:Sidebar which is meaningless. "Garbage in, garbage out." :)
Can you confirm?
Unconfirmed. The translator skips any article titles which include "MediaWiki:", "Template:", or that contain any ':' characters. At some point, I will enable template translation, but I have to instrument the #if, #expr parser language in the translator to distinguish between tags and valid text entries. The problem is related to MediaWiki: articles overwriting default settings for the language, which could affect any non-English Wiki, though I completely agree with you that my use of MediaWiki with Machine translators is outside of the designed scope of the MediaWiki project.
I do agree, however, that MediaWiki: entries should be stripped out of XML dumps if they just contain articles and template. MediaWiki settings should probably not be included in such dumps if they are going to be imported into a non-English Wiki. For now, I think just documenting it is sufficient.
I leave it to your call if you feel the dumps should exclude some of the MediaWiki settings. Personally, I would advise against it since it is always possible someone could inadvertantly insert some text or content that could create a security exploit or problems with other MediaWiki versions, but as I said, this is your call. I think documenting and understanding there could be a potential issue down the road or if someone uses the MediaWiki software to import dumps across languages there could be compatibility issues.
I would recommend excluding such software configuration pages from your translation tool.
Totally agree.
Jeff
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF+a+PwRnhpk1wk44RAs4yAJ9q4oxqT9hbE5oi50GZVruU+DJodgCgi8oC HK6LKfi3VxRbJkK0x+fx6Hk= =wL/Y -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Jeff V. Merkey wrote:
I leave it to your call if you feel the dumps should exclude some of the MediaWiki settings. Personally, I would advise against it since it is always possible someone could inadvertantly insert some text or content that could create a security exploit or problems with other MediaWiki versions, but as I said, this is your call. I think documenting and understanding there could be a potential issue down the road or if someone uses the MediaWiki software to import dumps across languages there could be compatibility issues.
I doubt users would inadvertantly insert a security exploit for older versions. More likely, you may get javascript errors. And if you remove the messages, you will get people complaining of NavFramem RealTitleBanner, featured articles Interwikis... not working
Platonides wrote:
Jeff V. Merkey wrote:
I leave it to your call if you feel the dumps should exclude some of the MediaWiki settings. Personally, I would advise against it since it is always possible someone could inadvertantly insert some text or content that could create a security exploit or problems with other MediaWiki versions, but as I said, this is your call. I think documenting and understanding there could be a potential issue down the road or if someone uses the MediaWiki software to import dumps across languages there could be compatibility issues.
I doubt users would inadvertantly insert a security exploit for older versions. More likely, you may get javascript errors. And if you remove the messages, you will get people complaining of NavFramem RealTitleBanner, featured articles Interwikis... not working
I'll just strip the MediaWiki articles out of and translated dumps and document the issue for now.
jeff
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
svp ne menvoyer plus de message et merci davance
"Jeff V. Merkey" jmerkey@wolfmountaingroup.com a écrit : Brion Vibber wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jeff V. Merkey wrote:
The inclusion of MediaWiki: pages in the base Article, templates, etc. dumps, such as the English Wikipedia, cause major problems if you are trying to import these dumps into a Wikipedia enabled for another language. Many of the navigation menu's get trashed and corrupted when these MediaWiki labels for the English Wikipedia are included in the base article and template dumps.
Please describe "trashed and corrupted" by giving specific examples. Also try to describe what "enabled for another language" means.
From language/messages/MessagesChr.php
# Recent changes 'changes' => 'áá¦ááá´áá', 'recentchanges' => 'á¾áᬠáá¦ááá´áá', 'recentchanges-url' => 'á¤á¤áµá:Recentchanges', 'recentchangestext' => 'á¦á á ᢠá¯á á¤áªáá á¾áᬠáá¦ááá´áá á¯á wiki á¾á¿ áªá¯ á¤ááá.',
"recentchanges-url" and other utrl tags end up mapping to their translated constructs as the internal representation. i.e. recentchanges-url ends up rendering as "á¾áᬠáá¦ááá´áá-url" as the page link INTERNALLY rather than "recentchanges-url", even though the internal variable is named something else. This causes the page linking to "á¤á¤áµá:Recentchanges" to end up pointing to a non-existent link. All of the -url variables end up getting corrupted this way.
Does it mean you have configured the wiki to a language different from the language you are importing?
Yes.
Can you explain why you would do this?
Translated Cherokee articles created from the English Wiki dumps.
It sounds like a simple configuration error on your part. The obvious result would be that both articles and customized messages will appear in the language you imported (eg English) instead of the language you set it to.
If you simply want to look at the menu in another language, you probably want to just change your user preference, not the wiki-wide content language.
Hmmm, not sure about this one, but I will investigate this one.
It certainly is ok to include them in the dumps which supposedly contains "everything" (the .7z dumps) but they shoud be stripped out of the base dumps which are supposed to just contain the articles and templates for the articles.
Will consider making that change.
Probably a good idea.
Note that you can exclude a namespace from your import using mwdumper by using the namespace filter, for instance:
--filter=namespace:!NS_MEDIAWIKI
(You can also use mwdumper to filter the XML and feed it into importDump.php or another tool to do the actual import, if you prefer using another tool.)
Thanks. I will update the database dumps page on meta with this information on how to strip out the MediaWiki namespace entries.
Jeff
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF+ZKDwRnhpk1wk44RArW0AJsEdkLliDh1ojn+hXMKyYjS+uOJmgCeLjWj WpHPz577YLDxODL1STIemsI= =ooPk -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
--------------------------------- Découvrez une nouvelle façon d'obtenir des réponses à toutes vos questions ! Profitez des connaissances, des opinions et des expériences des internautes sur Yahoo! Questions/Réponses.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
arfi Mohamed wrote:
svp ne menvoyer plus de message et merci davance
(Has been unsubscribed.)
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
wikitech-l@lists.wikimedia.org