Hello Björn,
yes, you are completely correct, it is based on an ad-hoc reg-exp, and the problem in the examples you mention is indeed due to language links that are commented out. I am surprised by the amount of commented out language links -- there seem to be plenty of them, and I do not fully understand why.
A full parse would have been to expensive to perform. I will update the explanatory text to reflect that. Thank you for finding this issue!
Cheers, Denny
2012/6/25 Bjoern Hoehrmann derhoermi@gmx.net:
- denny.vrandecic@wikimedia.de wrote:
The full data set is available here: http://simia.net/languagelinks/
http://simia.net/languagelinks/doublelinks/doublelinks.de.html seems to have some errors, for example, it lists "Rundfunkjahr 1924 to [[en:1924 in radio]]" but as far as I can tell, it's there only once (and I don't see it in any template either, nor do there seem to be older revisions with the problem). For http://de.wikipedia.org/wiki/NGC_61 it seems to list many links that are actually commented out in the wikitext. That in fact seems to be a general problem. Is this based on some ad-hoc regex, rather than the database data or a proper parse? -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l