On 06/06/07, GerardM gerard.meijssen@gmail.com wrote:
Hoi, When you look at the details for the HTML it will tell you that the language is English. It is obviously not. Technically all content in Wikisource.orgthat is not English should be marked for the language that it is.
Having content marked English while it is in actual fact not English means that the meta-data of the page is wrong. Having multiple languages within the same MediaWiki database is technically a disaster. It is not marked in any way what language it is. This is in and of itself bad.
Well, meta seems to manage well enough :-)
Seriously, though, there are projects where a hubbub of multilinguality is pretty much inevitable - Commons being the obvious example, even if we just write meta off as internal craziness. Would it not be simplest to contrive some way of allowing the page content to dictate the metadata published by mediawiki, rather than declaring we just can't do it, period? A much more robust long-term solution.
After all, even if we import the entire known corpus, I can't see ecr.wikisource.org ever consisting of more than a few kilobytes of text...