Jimmy Wales wrote:
Wiktionary has been cranking along happily in a state
of technical
neglect for quite some time.
It's also a more friendly environment. A lot of the fractiousnous over
NPOV just doesn't happen there.
There are currently 32,246 entries. That's enough
that we must
preserve the work that's already done. It also precludes any change
of license, whether that's fortunate or unfortunate I don't know.
Probably so. With a dictionary it's often difficult to know what is or
what even can be copyrightable. Certainly the overall presentation
would be copyright, but a lot of the small entries are often subject to
the limited number of ways that a word can be defined.
The possibility of a different approach to licensing has, however, been
raised at Wikisource -- and that project still has a small number of
articles, and public domain material is more important there.
There is an active community there, with a lot of
overlap to the
broader wikipedia community. The need to be consulted on any changes
that we implement.
I presume the "The need" should read "They need" :-) For the
most part
the Wiktionarians have not been too critical of the software, and have
adapted to it quite well. If there is any one recurring complaint it
might be with the enforced capitalization of the first letter. This
creates an awkwardness when words can differ by whether the first letter
is a capital or minuscule letter. The problem does occasionaly occur in
Wikipedia (Communism vs. communism), but it happens far more frequently
in Wiktionary. Most words are normally written to begin with a lower
case letter.
They have an existing schema whereby they are doing in
freeform text
just what we ought to try to help them formalize with actual database
functionality. One can only assume that their scheme is sometimes
followed inconsistently because human editing is inevitably
inconsistent. However, there appears to me to be enough consistency
that a semi-automated conversion process should be possible.
There is indeed this sort of inconsistency, but it has not been a big
problem. There are many words with complicated and multiple etymologies
that would not fit well with a rigid structure. If the technical
improvements are only *semi*-automated it should allow for an easy
override of the structure.
Anything that we do should favor the needs of editors
over abstract a
priori desires for the end product. That is to say, if some fancy and
clever thing requires a lot of work from editors, we just skip it.
The editors are primary, or any wiki community will be destroyed.
I strongly agree. We are still a very far distance from any kind of end
product. That would require some level of comprehensiveness, and I find
from the work that I have done adapting the 1913 Webster with its more
than 109,000 entries is a slow and tedious process. Reformatting it and
expanding abbreviations to be distinctively Wiktionarian is only a part
of it. One of the features of the 1913 Webster is the large number of
examples of usage that are given from various writers; better
identifying some of these passages gives value added to Wiktionary's
entries. For Shakespeare quotes this means identifying the play from
which the passage comes.
At the same time, we should design a "structured
wiki" with one eye on
campatibility with re-use. If there are existing XML schemas that
have prominence in the wider community, we should look to them as a
part of our design, even if we deliberately choose not to implement
every possible aspect in order to favor ease-of-use for editors.
I'll wait to hear from the tech people before commenting much on this,
but I think that the ability to opt out of features is good.
Consider this for an example:
http://wiktionary.org/wiki/Vision
As a rank amateur database designer, I see several immediate
possibilities which would make an instant and easy improvement. Even
if we had a simple and less-than-ideal design, we could lay the
groundwork now for something better in the future.
It would be interesting to see you expand on this.
I'm a huge fan of incremental change in cases like
this. We'd like to
improve the software for the wiktionarians in a way that conforms to
how they like to edit, while laying the groundwork for further
revisions down the line.
--------
Consider a really bad database design, a 'flat file' design, or nearly
so.
word
AHD pronunciation
IPA pronunciation
SAMPA pronunciation
definition
synonym list
related terms list
translation list
This is a horrible design, with multi-valued fields, etc. It can be
improved in just a few minutes of work. But even this horrible design
would be better than freeform text.
I guess we've just become too accustomed to freeform. :-) At the same
time I think that we have been creative in making use of various
headings and indents.
Developer time and energy is at a premium (at least,
until some clever
developer really takes this up as a cause!) and so simplicity is a
huge virtue. A little bit of fixing done soon, is better than an
imaging hypothetical perfect system that's too intimidating and never
gets off the ground.
Agreed. Still it's often difficult for us non-techies to know just what
is easy for developers.
There are two major functions in such a dictionary. The first is in
presenting the language to itself; the second is in making it available
to speakers of other languages including being able to recognize what
language a word might be from. More automated indexing could be helpful
to achieving that end. As long as it needs to be done manually our
contributors will be very inconsistent in remembering to put a word on
each of the half-dozen languages that they may have added at a given
point. We currently have many indexes, but many of the words that have
been added to Wiktionary are not there.
I would also like to see some effort on the part of others in starting
the Wiktionaries for other languages which could focus on presenting
this materials to speakers of that language. It is impossible to create
interwiktionary links before that happens.
Ec