@Nemo: Good idea to use the page_id. However, 1037952 is the page_id for the target page: Template:汉语写法 ("B1"). Based on what Petr suggested, I don't think there is any separate entry in the page table for Template:漢語寫法. ("BC"). It's a good thing to know, and next time I'll try to include the page id.

@Petr: Thanks for listing the class files, as well as the metawiki link! I took a quick look at the three, and this seems like it's it. Specifically:

* LanguageZh.php will cause the parser to auto-translate titles / text based on variants. So in the example above, {{漢語寫法}} ("BC") is translated to {{汉语写法}} ("B1"). "BC" is not a valid title in the page table but "B1" is. Note that {{BC}} is what's actually on the page, and there is no way to get to B1 without going through LanguageZh.php (i.e.: this information can't be derived from the data dump)

* The translation is mostly on a character by character level (but not entirely, as per the metawiki link). The entire word "漢語寫法" is not in ZhConversion.php, but each of the first 3 characters characters are (the 4th character is the same in both cases).

I need to review this more, but it's enough to get me started.

Thanks!

On Thu, Dec 19, 2013 at 1:55 PM, Petr Onderka <gsvick@gmail.com> wrote:
This is intentional and it's a conversion from Traditional Chinese to
Simplified Chinese.

For more information, have at a look at the (quite old) page Automatic
conversion between simplified and traditional Chinese [1].
If you wanted to perform the same conversion by yourself, the relevant
code is in languages/classes/LanguageZh.php, includes/ZhConversion.php
and maintenance/language/zhtable/.

Petr Onderka
[[en:User:Svick]]

[1]: https://meta.wikimedia.org/wiki/Automatic_conversion_between_simplified_and_traditional_Chinese

On Thu, Dec 19, 2013 at 6:01 AM, gnosygnu <gnosygnu@gmail.com> wrote:
> Hi. I'm not sure if this is a dump issue, but I thought I'd start off here.
>
> I had a user report a missing page in zh.wiktionary.org:
> https://sourceforge.net/p/xowa/tickets/291/. It seems that a Main namespace
> page (學生) references a template (Template:漢語寫法) that is not in the dump.
>
> Here are more details:
>
> * The reference page is zh.wiktionary.org/wiki/學生 (or %E5%AD%B8%E7%94%9F)
> * It uses a template from https://zh.wiktionary.org/wiki/Template:漢語寫法 or
> Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95
> * When I plug this url into my Firefox address bar, I get redirected to
> https://zh.wiktionary.org/wiki/Template:汉语写法 or
> Template:%E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95
> * I can't read Chinese, but Google translate tells me that both terms mean
> "Chinese wording", so this looks like a redirect to me
> * However, unlike other redirects, I don't get a "Redirected from" link.
> - For example, https://zh.wiktionary.org/wiki/Template:En redirects me to
> Template:en but shows a link for "(重定向自Template:En)"
> - Template:汉语写法 doesn't show any corresponding "redirected from" link
>
> * More importantly, Template:漢語寫法 or
> Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 is not in the dump file
> * I've tried grepping the xml for both 漢語寫法</title> and
> %E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95</title> but neither return
> * However, grepping for 汉语写法</title> (the redirected page) does return one
> match:
>
> $ grep zhwiktionary-latest-pages-articles.xml -f zhfind.txt
>     <title>Template:汉语写法</title>
>
>
> Is this a configuration issue with the language / wiki? Or is the page
> itself defective?
>
> Any pointers / help would be appreciated.
> Thanks.
>
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>