@Nemo: Good idea to use the page_id. However, 1037952 is the page_id for
the target page: Template:汉语写法 ("B1"). Based on what Petr suggested, I
don't think there is any separate entry in the page table for
Template:漢語寫法. ("BC"). It's a good thing to know, and next time I'll try
to
include the page id.
@Petr: Thanks for listing the class files, as well as the metawiki link! I
took a quick look at the three, and this seems like it's it. Specifically:
* LanguageZh.php will cause the parser to auto-translate titles / text
based on variants. So in the example above, {{漢語寫法}} ("BC") is translated
to {{汉语写法}} ("B1"). "BC" is not a valid title in the page table but
"B1"
is. Note that {{BC}} is what's actually on the page, and there is no way to
get to B1 without going through LanguageZh.php (i.e.: this information
can't be derived from the data dump)
* The translation is mostly on a character by character level (but not
entirely, as per the metawiki link). The entire word "漢語寫法" is not in
ZhConversion.php, but each of the first 3 characters characters are (the
4th character is the same in both cases).
I need to review this more, but it's enough to get me started.
Thanks!
On Thu, Dec 19, 2013 at 1:55 PM, Petr Onderka <gsvick(a)gmail.com> wrote:
This is intentional and it's a conversion from
Traditional Chinese to
Simplified Chinese.
For more information, have at a look at the (quite old) page Automatic
conversion between simplified and traditional Chinese [1].
If you wanted to perform the same conversion by yourself, the relevant
code is in languages/classes/LanguageZh.php, includes/ZhConversion.php
and maintenance/language/zhtable/.
Petr Onderka
[[en:User:Svick]]
[1]:
https://meta.wikimedia.org/wiki/Automatic_conversion_between_simplified_and…
On Thu, Dec 19, 2013 at 6:01 AM, gnosygnu <gnosygnu(a)gmail.com> wrote:
Hi. I'm not sure if this is a dump issue, but
I thought I'd start off
here.
I had a user report a missing page in
zh.wiktionary.org:
https://sourceforge.net/p/xowa/tickets/291/. It seems that a Main
namespace
page (學生) references a template (Template:漢語寫法)
that is not in the dump.
Here are more details:
* The reference page is
zh.wiktionary.org/wiki/學生<http://zh.wiktionary.org/wiki/%E5%AD%B8%E7%94%…
%E5%AD%B8%E7%94%9F)
* It uses a template from
https://zh.wiktionary.org/wiki/Template:漢語寫法or
Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95
* When I plug this url into my Firefox address bar, I get redirected to
https://zh.wiktionary.org/wiki/Template:汉语写法 or
Template:%E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95
* I can't read Chinese, but Google translate tells me that both terms
mean
"Chinese wording", so this looks like a
redirect to me
* However, unlike other redirects, I don't get a "Redirected from" link.
- For example,
https://zh.wiktionary.org/wiki/Template:En redirects me
to
Template:en but shows a link for
"(重定向自Template:En)"
- Template:汉语写法 doesn't show any corresponding "redirected from" link
* More importantly, Template:漢語寫法 or
Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 is not in the dump file
* I've tried grepping the xml for both 漢語寫法</title> and
%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95</title> but neither return
* However, grepping for 汉语写法</title> (the redirected page) does return
one
match:
$ grep zhwiktionary-latest-pages-articles.xml -f zhfind.txt
<title>Template:汉语写法</title>
Is this a configuration issue with the language / wiki? Or is the page
itself defective?
Any pointers / help would be appreciated.
Thanks.
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l