Hi,
I am trying to tranlations from Wiktionaries in different languages. Currently I use the "All pages, current versions only" dump. Is there a way to find out the language template tags (is that the correct term?) for each Wiktionary and each language?
For example: This is the Hungarian page 'karcsu' (slim, slender) http://hu.wiktionary.org/wiki/karcs%C3%BA (the edit page: http://hu.wiktionary.org/w/index.php?title=karcs%C3%BA&action=edit) The translation table always (?) starts like this: {{-ford-}} {{trans-top}} *{{en}}: {{t|en|slim}}, {{t|en|slender}}
Where {{-ford-}} comes from the word forditas (translation in Hungarian, I skipped the accents). The translations look like the 3rd row and (hopefully) contain the other languages wiki codes (en, fr, de).
Also on the page 'slim' in the Hungarian Wiktionary there are some tags which nobody would understand unless they are Hungarian and they have learned some Hungarian grammar. http://hu.wiktionary.org/wiki/slim and http://hu.wiktionary.org/w/index.php?title=slim&action=edit The first line is: {{engmell|comp=slimmer|sup=slimmest|pron=/slɪm/|audio=us}}
Where 'engmell' is derived from 'english melleknev', melleknev meaning adjective in Hungarian. There rest is similarly confusing.
It gets even more confusing if I look at other Wiktionaries. It seems that there are no standards that all Wiktionaries follow.
Is this meta-information available somewhere?
I hope I managed to explain it clearly and I am asking on the right list.
Thank you in advance, Judit Acs
Judit,
DBpedia Wiktionary extraction is trying to extract data from Wiktionary as well, and as far as I know they also have the problem that each language uses different templates. Maybe they can help you though. See http://wiki.dbpedia.org/Wiktionary or contact the developers at https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion .
Regards, Christopher
On Fri, Nov 23, 2012 at 11:18 AM, Judit, Ács judit@ilab.sztaki.hu wrote:
Hi,
I am trying to tranlations from Wiktionaries in different languages. Currently I use the "All pages, current versions only" dump. Is there a way to find out the language template tags (is that the correct term?) for each Wiktionary and each language?
For example: This is the Hungarian page 'karcsu' (slim, slender) http://hu.wiktionary.org/wiki/karcs%C3%BA (the edit page: http://hu.wiktionary.org/w/index.php?title=karcs%C3%BA&action=edit) The translation table always (?) starts like this: {{-ford-}} {{trans-top}} *{{en}}: {{t|en|slim}}, {{t|en|slender}}
Where {{-ford-}} comes from the word forditas (translation in Hungarian, I skipped the accents). The translations look like the 3rd row and (hopefully) contain the other languages wiki codes (en, fr, de).
Also on the page 'slim' in the Hungarian Wiktionary there are some tags which nobody would understand unless they are Hungarian and they have learned some Hungarian grammar. http://hu.wiktionary.org/wiki/slim and http://hu.wiktionary.org/w/index.php?title=slim&action=edit The first line is: {{engmell|comp=slimmer|sup=slimmest|pron=/slɪm/|audio=us}}
Where 'engmell' is derived from 'english melleknev', melleknev meaning adjective in Hungarian. There rest is similarly confusing.
It gets even more confusing if I look at other Wiktionaries. It seems that there are no standards that all Wiktionaries follow.
Is this meta-information available somewhere?
I hope I managed to explain it clearly and I am asking on the right list.
Thank you in advance, Judit Acs
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
xmldatadumps-l@lists.wikimedia.org