Hi. I'm not sure if this is a dump issue, but I thought I'd start off here.
I had a user report a missing page in zh.wiktionary.org: https://sourceforge.net/p/xowa/tickets/291/. It seems that a Main namespace page (學生) references a template (Template:漢語寫法) that is not in the dump.
Here are more details:
* The reference page is zh.wiktionary.org/wiki/學生http://zh.wiktionary.org/wiki/%E5%AD%B8%E7%94%9F(or %E5%AD%B8%E7%94%9F) * It uses a template from https://zh.wiktionary.org/wiki/Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 or Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 * When I plug this url into my Firefox address bar, I get redirected to https://zh.wiktionary.org/wiki/Template:%E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95 or Template:%E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95 * I can't read Chinese, but Google translate tells me that both terms mean "Chinese wording", so this looks like a redirect to me * However, unlike other redirects, I don't get a "Redirected from" link. - For example, https://zh.wiktionary.org/wiki/Template:En redirects me to Template:en but shows a link for "(重定向自Template:En)" - Template:汉语写法 doesn't show any corresponding "redirected from" link
* More importantly, Template:漢語寫法 or Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 is not in the dump file * I've tried grepping the xml for both 漢語寫法</title> and %E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95</title> but neither return * However, grepping for 汉语写法</title> (the redirected page) does return one match:
$ grep zhwiktionary-latest-pages-articles.xml -f zhfind.txt <title>Template:汉语写法</title>
Is this a configuration issue with the language / wiki? Or is the page itself defective?
Any pointers / help would be appreciated. Thanks.
gnosygnu, 19/12/2013 06:01:
Hi. I'm not sure if this is a dump issue, but I thought I'd start off here.
I had a user report a missing page in zh.wiktionary.org http://zh.wiktionary.org: https://sourceforge.net/p/xowa/tickets/291/. It seems that a Main namespace page (學生) references a template (Template:漢語寫法) that is not in the dump.
I hope it's not a language converter issue. :) Have you tried grepping for page id (1037952) or last revision's id (3773706)? That's more reliable than using spellings you don't know. I checked the basics: the article ("student", ID 2602) doesn't have wanted templates, the template exports without fatals at Special:Export
Nemo
This is intentional and it's a conversion from Traditional Chinese to Simplified Chinese.
For more information, have at a look at the (quite old) page Automatic conversion between simplified and traditional Chinese [1]. If you wanted to perform the same conversion by yourself, the relevant code is in languages/classes/LanguageZh.php, includes/ZhConversion.php and maintenance/language/zhtable/.
Petr Onderka [[en:User:Svick]]
[1]: https://meta.wikimedia.org/wiki/Automatic_conversion_between_simplified_and_...
On Thu, Dec 19, 2013 at 6:01 AM, gnosygnu gnosygnu@gmail.com wrote:
Hi. I'm not sure if this is a dump issue, but I thought I'd start off here.
I had a user report a missing page in zh.wiktionary.org: https://sourceforge.net/p/xowa/tickets/291/. It seems that a Main namespace page (學生) references a template (Template:漢語寫法) that is not in the dump.
Here are more details:
- The reference page is zh.wiktionary.org/wiki/學生 (or %E5%AD%B8%E7%94%9F)
- It uses a template from https://zh.wiktionary.org/wiki/Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 or
Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95
- When I plug this url into my Firefox address bar, I get redirected to
https://zh.wiktionary.org/wiki/Template:%E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95 or Template:%E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95
- I can't read Chinese, but Google translate tells me that both terms mean
"Chinese wording", so this looks like a redirect to me
- However, unlike other redirects, I don't get a "Redirected from" link.
- For example, https://zh.wiktionary.org/wiki/Template:En redirects me to
Template:en but shows a link for "(重定向自Template:En)"
- Template:汉语写法 doesn't show any corresponding "redirected from" link
- More importantly, Template:漢語寫法 or
Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 is not in the dump file
- I've tried grepping the xml for both 漢語寫法</title> and
%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95</title> but neither return
- However, grepping for 汉语写法</title> (the redirected page) does return one
match:
$ grep zhwiktionary-latest-pages-articles.xml -f zhfind.txt <title>Template:汉语写法</title>
Is this a configuration issue with the language / wiki? Or is the page itself defective?
Any pointers / help would be appreciated. Thanks.
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
@Nemo: Good idea to use the page_id. However, 1037952 is the page_id for the target page: Template:汉语写法 ("B1"). Based on what Petr suggested, I don't think there is any separate entry in the page table for Template:漢語寫法. ("BC"). It's a good thing to know, and next time I'll try to include the page id.
@Petr: Thanks for listing the class files, as well as the metawiki link! I took a quick look at the three, and this seems like it's it. Specifically:
* LanguageZh.php will cause the parser to auto-translate titles / text based on variants. So in the example above, {{漢語寫法}} ("BC") is translated to {{汉语写法}} ("B1"). "BC" is not a valid title in the page table but "B1" is. Note that {{BC}} is what's actually on the page, and there is no way to get to B1 without going through LanguageZh.php (i.e.: this information can't be derived from the data dump)
* The translation is mostly on a character by character level (but not entirely, as per the metawiki link). The entire word "漢語寫法" is not in ZhConversion.php, but each of the first 3 characters characters are (the 4th character is the same in both cases).
I need to review this more, but it's enough to get me started.
Thanks!
On Thu, Dec 19, 2013 at 1:55 PM, Petr Onderka gsvick@gmail.com wrote:
This is intentional and it's a conversion from Traditional Chinese to Simplified Chinese.
For more information, have at a look at the (quite old) page Automatic conversion between simplified and traditional Chinese [1]. If you wanted to perform the same conversion by yourself, the relevant code is in languages/classes/LanguageZh.php, includes/ZhConversion.php and maintenance/language/zhtable/.
Petr Onderka [[en:User:Svick]]
On Thu, Dec 19, 2013 at 6:01 AM, gnosygnu gnosygnu@gmail.com wrote:
Hi. I'm not sure if this is a dump issue, but I thought I'd start off
here.
I had a user report a missing page in zh.wiktionary.org: https://sourceforge.net/p/xowa/tickets/291/. It seems that a Main
namespace
page (學生) references a template (Template:漢語寫法) that is not in the dump.
Here are more details:
- The reference page is zh.wiktionary.org/wiki/學生http://zh.wiktionary.org/wiki/%E5%AD%B8%E7%94%9F(or %E5%AD%B8%E7%94%9F)
- It uses a template from https://zh.wiktionary.org/wiki/Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95...
Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95
- When I plug this url into my Firefox address bar, I get redirected to
https://zh.wiktionary.org/wiki/Template:%E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95 or Template:%E6%B1%89%E8%AF%AD%E5%86%99%E6%B3%95
- I can't read Chinese, but Google translate tells me that both terms
mean
"Chinese wording", so this looks like a redirect to me
- However, unlike other redirects, I don't get a "Redirected from" link.
- For example, https://zh.wiktionary.org/wiki/Template:En redirects me
to
Template:en but shows a link for "(重定向自Template:En)"
- Template:汉语写法 doesn't show any corresponding "redirected from" link
- More importantly, Template:漢語寫法 or
Template:%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95 is not in the dump file
- I've tried grepping the xml for both 漢語寫法</title> and
%E6%BC%A2%E8%AA%9E%E5%AF%AB%E6%B3%95</title> but neither return
- However, grepping for 汉语写法</title> (the redirected page) does return
one
match:
$ grep zhwiktionary-latest-pages-articles.xml -f zhfind.txt <title>Template:汉语写法</title>
Is this a configuration issue with the language / wiki? Or is the page itself defective?
Any pointers / help would be appreciated. Thanks.
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
xmldatadumps-l@lists.wikimedia.org