Hi Gerard,
Actual work on UW itself is underway. Here you can find the data desisgn http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design This design is very much open for comments and I am happy to say that many comments that were given have led to changes. I name but a few changes that came about this way; Can sign languages be included - now they can, Can attestations be included - now they can.
I want to propose (again) to make one important change: I think it is important that an entry within one language can be tagged as being correct according to several orthographies within one language. From what I understood so far, I find that the word de: "ist" (English: "(he) is") must be inserted twice, once for the new German spelling and once for the old (before the recent reform). Even thogh this word was not affected by the spelling reform. This applies to 95% of all German words. And each of them gets complete translation coverage into all languages. This is also a problem for Low Saxon (with our wide range of possible spellings). You have tried to make your current design plausible to me when we talked about it recently, but I was not convinced that this huge multiplication of entries is a good idea. Maybe I misunderstood you somehow, but I still do not understand it.
Then again, if we create a wordcount on the Wikipedia content, run it against a spellchecker, the resulting list should be spelled correctly and could be included in UW. Particularly for our biggest wikipedias and the amount of topics covered, it should be a list that might be close to the size of what Aspell has. We will also have a long list of words missing in Aspell. We will however not get a spellchecker for British or American in this way.
Does that mean that you think about importing huge amounts of words without definition and without any translation?
Heiko