Heiko Evermann wrote:
Hi Gerard,
Thank you for your answer.
The German situation is a bit difficult. In actual fact there are only two orthographies because two Bundeslander did not pass as law that the new spelling would apply there as well. The consequence is that both old spelling and new spelling are valid. In a typical situation, the words that have been changed would get dated and be outdated. From a practical point of view I would only have the changed words and the new words included and I would treat them as if these two Bundeslander had voted in favour. For lookup purposes the difference is a SELECT statement in the query statement.
So you do not want to include the old spelling? From what I understood for Low Saxon you also wanted to include historic spellings. But I may have misunderstood that.
Sorry, good try but no cigar. The words that are spelled differently will both be in there. They will both have a record in ValidExpression where the old spelling will have a value in the ValidUntil field and the new spelling will have a value in the ValidFrom field.
There is room for historic orthographies, it may prove instructive in demonstrating the ongoing Germanisation of Lower Saxon
The argument why all words have to be explicitly identified as belonging to an orthography is because it allows us to do other things than just producing lexicological information from the Internet. What in your perception is an "multiplication of entries" is in actual fact no such thing; an expression is registered only once for each language, dialect or orthography.
So number of entries = (number of languages) x (number of dialects) x (number of orthographies)?
What are you planning to do with American English vs. British English?
You would have two entries:
title=colour lang=EN dialect=EN_US orthography=USA-official 2) title=color lang=EN dialect=EN_GB orthography=GB official
That is fine. But what about "bus"? would you have two entries?
title=bus lang=EN dialect=EN_US orthography=USA-official 2) title=bus lang=EN dialect=EN_GB orthography=GB official
That (to my understanding) would double the entries for English, wouldn't it? And the translation of de:Bus would list en_US: bus, en_GB:bus?
Kind regards,
Heiko Evermann
First of all I am not a specialist when it comes to the spelling of American English or British English. Depending on there being an official body that identifies correctly spelled English, a spelling can be either validated by one organisation or by two organisations. When this is the case, there is no need for duplication. This is functionality implicitly there in the data design.
The examples that you show bear no relation to what UW will look like nor how the edit screens will look like I am happy to say :) There is this big difference in the attitude of the way Lower Saxon is treating its orthograhies and the way Sicilan or Napolitan orthographies are treated. The Lower Saxon seem really eager to have only one orthography and therefore a mix of the different spellings is not likely to find much apreciation by many.
The duplication of words that are spelled the same in different dialects or orthographies is inherent in the database design. This is essential if you want to have definitions and etymology in these dialects or orthographies. If you are willing to accept that definitions and etymology can be spelled in orthographies other than Sass there could be a solution but as the nds.wikipedia also has to standardise on Sass, I think this is a rather unlikely scenario.
Thanks, GerardM