Hi Gerard,
Thank you for your answer.
The German situation is a bit difficult. In actual fact there are only two orthographies because two Bundeslander did not pass as law that the new spelling would apply there as well. The consequence is that both old spelling and new spelling are valid. In a typical situation, the words that have been changed would get dated and be outdated. From a practical point of view I would only have the changed words and the new words included and I would treat them as if these two Bundeslander had voted in favour. For lookup purposes the difference is a SELECT statement in the query statement.
So you do not want to include the old spelling? From what I understood for Low Saxon you also wanted to include historic spellings. But I may have misunderstood that.
The argument why all words have to be explicitly identified as belonging to an orthography is because it allows us to do other things than just producing lexicological information from the Internet. What in your perception is an "multiplication of entries" is in actual fact no such thing; an expression is registered only once for each language, dialect or orthography.
So number of entries = (number of languages) x (number of dialects) x (number of orthographies)?
What are you planning to do with American English vs. British English?
You would have two entries: 1) title=colour lang=EN dialect=EN_US orthography=USA-official 2) title=color lang=EN dialect=EN_GB orthography=GB official
That is fine. But what about "bus"? would you have two entries? 1) title=bus lang=EN dialect=EN_US orthography=USA-official 2) title=bus lang=EN dialect=EN_GB orthography=GB official
That (to my understanding) would double the entries for English, wouldn't it? And the translation of de:Bus would list en_US: bus, en_GB:bus?
Kind regards,
Heiko Evermann