I've asked Jason to setup a wiktionary-l. I don't know if he has yet, but when he has, I intend for there to be a big notice posted on wikitionary.org, on the wikipedia-l and wikien-l mailing lists, as well as on Wikipedia itself, inviting a sort of "global summit meeting" to discuss some of the things that I outline below.
Wiktionary has been cranking along happily in a state of technical neglect for quite some time.
There are currently 32,246 entries. That's enough that we must preserve the work that's already done. It also precludes any change of license, whether that's fortunate or unfortunate I don't know.
There is an active community there, with a lot of overlap to the broader wikipedia community. The need to be consulted on any changes that we implement.
They have an existing schema whereby they are doing in freeform text just what we ought to try to help them formalize with actual database functionality. One can only assume that their scheme is sometimes followed inconsistently because human editing is inevitably inconsistent. However, there appears to me to be enough consistency that a semi-automated conversion process should be possible.
Anything that we do should favor the needs of editors over abstract a priori desires for the end product. That is to say, if some fancy and clever thing requires a lot of work from editors, we just skip it. The editors are primary, or any wiki community will be destroyed.
At the same time, we should design a "structured wiki" with one eye on campatibility with re-use. If there are existing XML schemas that have prominence in the wider community, we should look to them as a part of our design, even if we deliberately choose not to implement every possible aspect in order to favor ease-of-use for editors.
Consider this for an example: http://wiktionary.org/wiki/Vision
As a rank amateur database designer, I see several immediate possibilities which would make an instant and easy improvement. Even if we had a simple and less-than-ideal design, we could lay the groundwork now for something better in the future.
I'm a huge fan of incremental change in cases like this. We'd like to improve the software for the wiktionarians in a way that conforms to how they like to edit, while laying the groundwork for further revisions down the line.
--------
Consider a really bad database design, a 'flat file' design, or nearly so.
word AHD pronunciation IPA pronunciation SAMPA pronunciation definition synonym list related terms list translation list
This is a horrible design, with multi-valued fields, etc. It can be improved in just a few minutes of work. But even this horrible design would be better than freeform text.
Developer time and energy is at a premium (at least, until some clever developer really takes this up as a cause!) and so simplicity is a huge virtue. A little bit of fixing done soon, is better than an imaging hypothetical perfect system that's too intimidating and never gets off the ground.
--Jimbo
Jimmy Wales wrote:
Wiktionary has been cranking along happily in a state of technical neglect for quite some time.
It's also a more friendly environment. A lot of the fractiousnous over NPOV just doesn't happen there.
There are currently 32,246 entries. That's enough that we must preserve the work that's already done. It also precludes any change of license, whether that's fortunate or unfortunate I don't know.
Probably so. With a dictionary it's often difficult to know what is or what even can be copyrightable. Certainly the overall presentation would be copyright, but a lot of the small entries are often subject to the limited number of ways that a word can be defined.
The possibility of a different approach to licensing has, however, been raised at Wikisource -- and that project still has a small number of articles, and public domain material is more important there.
There is an active community there, with a lot of overlap to the broader wikipedia community. The need to be consulted on any changes that we implement.
I presume the "The need" should read "They need" :-) For the most part the Wiktionarians have not been too critical of the software, and have adapted to it quite well. If there is any one recurring complaint it might be with the enforced capitalization of the first letter. This creates an awkwardness when words can differ by whether the first letter is a capital or minuscule letter. The problem does occasionaly occur in Wikipedia (Communism vs. communism), but it happens far more frequently in Wiktionary. Most words are normally written to begin with a lower case letter.
They have an existing schema whereby they are doing in freeform text just what we ought to try to help them formalize with actual database functionality. One can only assume that their scheme is sometimes followed inconsistently because human editing is inevitably inconsistent. However, there appears to me to be enough consistency that a semi-automated conversion process should be possible.
There is indeed this sort of inconsistency, but it has not been a big problem. There are many words with complicated and multiple etymologies that would not fit well with a rigid structure. If the technical improvements are only *semi*-automated it should allow for an easy override of the structure.
Anything that we do should favor the needs of editors over abstract a priori desires for the end product. That is to say, if some fancy and clever thing requires a lot of work from editors, we just skip it. The editors are primary, or any wiki community will be destroyed.
I strongly agree. We are still a very far distance from any kind of end product. That would require some level of comprehensiveness, and I find from the work that I have done adapting the 1913 Webster with its more than 109,000 entries is a slow and tedious process. Reformatting it and expanding abbreviations to be distinctively Wiktionarian is only a part of it. One of the features of the 1913 Webster is the large number of examples of usage that are given from various writers; better identifying some of these passages gives value added to Wiktionary's entries. For Shakespeare quotes this means identifying the play from which the passage comes.
At the same time, we should design a "structured wiki" with one eye on campatibility with re-use. If there are existing XML schemas that have prominence in the wider community, we should look to them as a part of our design, even if we deliberately choose not to implement every possible aspect in order to favor ease-of-use for editors.
I'll wait to hear from the tech people before commenting much on this, but I think that the ability to opt out of features is good.
Consider this for an example: http://wiktionary.org/wiki/Vision
As a rank amateur database designer, I see several immediate possibilities which would make an instant and easy improvement. Even if we had a simple and less-than-ideal design, we could lay the groundwork now for something better in the future.
It would be interesting to see you expand on this.
I'm a huge fan of incremental change in cases like this. We'd like to improve the software for the wiktionarians in a way that conforms to how they like to edit, while laying the groundwork for further revisions down the line.
Consider a really bad database design, a 'flat file' design, or nearly so.
word AHD pronunciation IPA pronunciation SAMPA pronunciation definition synonym list related terms list translation list
This is a horrible design, with multi-valued fields, etc. It can be improved in just a few minutes of work. But even this horrible design would be better than freeform text.
I guess we've just become too accustomed to freeform. :-) At the same time I think that we have been creative in making use of various headings and indents.
Developer time and energy is at a premium (at least, until some clever developer really takes this up as a cause!) and so simplicity is a huge virtue. A little bit of fixing done soon, is better than an imaging hypothetical perfect system that's too intimidating and never gets off the ground.
Agreed. Still it's often difficult for us non-techies to know just what is easy for developers.
There are two major functions in such a dictionary. The first is in presenting the language to itself; the second is in making it available to speakers of other languages including being able to recognize what language a word might be from. More automated indexing could be helpful to achieving that end. As long as it needs to be done manually our contributors will be very inconsistent in remembering to put a word on each of the half-dozen languages that they may have added at a given point. We currently have many indexes, but many of the words that have been added to Wiktionary are not there.
I would also like to see some effort on the part of others in starting the Wiktionaries for other languages which could focus on presenting this materials to speakers of that language. It is impossible to create interwiktionary links before that happens.
Ec
wikitech-l@lists.wikimedia.org