Hello Björn,
yes, you are completely correct, it is based on an ad-hoc reg-exp, and
the problem in the examples you mention is indeed due to language
links that are commented out.
I am surprised by the amount of commented out language links -- there
seem to be plenty of them, and I do not fully understand why.
A full parse would have been to expensive to perform. I will update
the explanatory text to reflect that. Thank you for finding this
issue!
Cheers,
Denny
2012/6/25 Bjoern Hoehrmann <derhoermi(a)gmx.net>et>:
* denny.vrandecic(a)wikimedia.de wrote:
The full data set is available here:
<http://simia.net/languagelinks/>
http://simia.net/languagelinks/doublelinks/doublelinks.de.html seems to
have some errors, for example, it lists "Rundfunkjahr 1924 to [[en:1924
in radio]]" but as far as I can tell, it's there only once (and I don't
see it in any template either, nor do there seem to be older revisions
with the problem). For
http://de.wikipedia.org/wiki/NGC_61 it seems to
list many links that are actually commented out in the wikitext. That in
fact seems to be a general problem. Is this based on some ad-hoc regex,
rather than the database data or a proper parse?
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de ·
http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 ·
http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 ·
http://www.websitedev.de/
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 |
http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.