Cunc-
- Non-natural language text should be avoided at all costs. It goes
part and parcel with the principles of the simplicity of the wiki syntax.
To me, wiki is not about natural language vs. "unnatural" language. It's about getting things done fast ("wiki wiki", remember?). We say "[[dog]]" because that's faster for the reader than "see dog" and for the writer than <A HREF="http://www.wikipedia.org/wiki/Dog">Dog</A>. Now before you say that taken to an extreme, this would mean using a Perl-like syntax with complex operators, the "fast" of course refers to both writing and reading. A category system needs to retain easy source readability, but I think "[[Category:Fruit]]" or "[[Category=Fruit]]" are both intuitive enough.
- Any such meta-data scheme should not be stored within the article
text. If it's being parsed so that it displays in a separate edit window, then it should be saved as a separate field/table in the database.
Well, from an implementation perspective, it's nice to get a common diff for both, I think, so I would prefer them to be in the same field. But I agree with you that separate edit windows make sense, especially if those who don't care about metadata can turn them off entirely. For now, however, I think that the amount of metadata is so low that the present scheme is sufficient.
That is to say, although I strongly think that such a categorization scheme should be an explicitly separate project from the core Wikipedia project-
I believe that to be a bad idea, because editing content and editing metadata are closely related. Someone who adds certain information about President Bush's family history may also want to categorize this information in the same step. Someone may add legitimate information through metadata that doesn't find its way into the article because nobody noticed it being added. Updating them gets harder the more separate data and metadata are.
- Any official sanction of a particular implementation for dealing with
categorization makes it orders less likely that the adoption of a better scheme in the future will happen than the case in which we explicitly design the system to be implementation-agnostic.
Open source development typically happens in an evolutionary fashion, that is, you build upon a solution that is imperfect and try to improve it incrementally without usually altering its fundamental structure. That's why vi and emacs are still around. It may not be the best model, but it's a model that achieves results, including life itself.
Whenever you add stuff to a machine, you make it less efficient. You can't just stick wings onto a car to make a super flying automobile. You don't make a combination toaster-blender. Dictionaries and encyclopedias are separate things. Web browsers make bad file browsers.
That's certainly a good point. There is a definite risk that too much metadata will disturb the regular functioning of the encyclopedia editing process. Already it can be a bit annoying to have interlanguage link edits in RC. And what if people add huge blocks of categories to very small articles, distracting from the content? I have thought about these problems when I made my initial proposal. My conclusion is that there are solutions for most, if not all, of them. We can filter RC. We can limit the number of categories. We can use sub-categories. And so forth.
At this point I believe the best approach is to simply try if this scheme works, and to then adjust it in an evolutionary fashion.
Etc.
?
Regards,
Erik