I'm currently working on a interwiki bot and it happens i gets some intewikis duplicated, some wikis refers to en:City while others to en:Town, etc. But the zh interwikis are the worst as there is not only that kind of problems but up to 3 ways to link with 'http://zh.wikipedia.org/wiki/$1': zh, zh-cn, zh-tw. And there is also zh-min-nan for 'http://zh-min-nan.wikipedia.org/wiki/$1'.
What i have found is that zh means (Zhong Wén) - Chinese, zh-cn Simplified and zh-tw traditional. However they all link to the same place. So what's the preferred option? Should zh-cn/zh-tw links converted to zh links? Should they 3 stay even if they link to the same page? How could a bot know if a page should be referred as simplified, traditional...? Is any meta: page explaining this? (if so, where?, as i haven't found). Should i better simply disable zh* to be recognised as interwikis?
Thanks in advance.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Platonides wrote:
I'm currently working on a interwiki bot and it happens i gets some intewikis duplicated, some wikis refers to en:City while others to en:Town, etc. But the zh interwikis are the worst as there is not only that kind of problems but up to 3 ways to link with 'http://zh.wikipedia.org/wiki/$1': zh, zh-cn, zh-tw. And there is also zh-min-nan for 'http://zh-min-nan.wikipedia.org/wiki/$1'.
Are you trying to write your own interwiki bot? Have you tried working with the existing interwiki bot in the pywikipediabot suite? If there is something wrong with the code in that bot, it is better if it is fixed than that you write an alternative! Othwerwise other people may be using broken code?
The problems with the zh: languages are all solved in the pywikipediabot suite.
Regards,
Rob Hooft
- -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/
All zh-cn and zh-tw codes are deprecated nowadays. The reason that they existed is that at some point in the past, many Chinese pages existed in separate simplified and traditional versions on the Chinese Wikipedia. This is not the case any more. Nowadays, there is only a single page, which can be automatically transliterated into either simplified or traditional (simplified being the default), depending on the wishes of the user. If everything is set up right, any zh-tw link should go to a page that now redirects to the corresponding simplified page. The page can be seen in traditional characters using the '不转换' tab at the top of the page.
In short: * zh: links are good * zh-cn: links should be changed to zh: links * zh-tw: links will usually go to a redirect to the corresponding zh-cn: page; if not, they should be changed to zh: links as well.
Regarding the question how the bot knew what was simplified and what was traditional: On pages in one version there used to be a link to the page in the other version; the text of this link was standard (or rather, there were not more than about 3 versions of it). If there was such a link to 'this page in traditional characters', the bot would assume the page was in simplified characters and treat that link as an interwiki to zh-tw:, if there was such a link to 'this page in simplified characters', the bot would assume the opposite. If only one of the two pages existed, the bot would make the link zh: rather than zh-cn: or zh-tw:. But, as said, that's all a thing of the past. Nowadays, zh-cn: and zh-tw: interwikis should be considered deprecated.
2006/2/1, Platonides platonides@gmail.com:
I'm currently working on a interwiki bot and it happens i gets some intewikis duplicated, some wikis refers to en:City while others to en:Town, etc. But the zh interwikis are the worst as there is not only that kind of problems but up to 3 ways to link with 'http://zh.wikipedia.org/wiki/$1': zh, zh-cn, zh-tw. And there is also zh-min-nan for 'http://zh-min-nan.wikipedia.org/wiki/$1'.
What i have found is that zh means (Zhong Wén) - Chinese, zh-cn Simplified and zh-tw traditional. However they all link to the same place. So what's the preferred option? Should zh-cn/zh-tw links converted to zh links? Should they 3 stay even if they link to the same page? How could a bot know if a page should be referred as simplified, traditional...? Is any meta: page explaining this? (if so, where?, as i haven't found). Should i better simply disable zh* to be recognised as interwikis?
Thanks in advance.
Wikibots-l mailing list Wikibots-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikibots-l
-- Andre Engels, andreengels@gmail.com ICQ: 6260644 -- Skype: a_engels
wikibots-l@lists.wikimedia.org