Larry Sanger wrote:
Sorry if this is already in the hopper or (!) has already been done, but it seems long overdue:
We need a way to compile, based on lists of links (I guess), "Recent Changes" lists for all articles about a general topic. I think this is actually a very pressing need that should have been taken care of long ago. (For the record, I asked for it probably a year ago or more.) The idea is that we'd be able to maintain lists of all articles on a given general subject, such as philosophy, and any changes made to any articles on the subject would show up on the recent changes page for that subject.
Why do we need this? At least one reason is it might help attract experts to the project. Just speaking for myself, I'm sure I'd spend more time on Wikipedia if there were a philosophy recent changes page. More importantly, the recent changes page has *always* been huge. It's now more cumbersome than ever and makes it hard for people to focus their attention, which would be a nice option. Lack of the feature also makes it hard to *monitor* goings-on in a general subject area.
With the relatively new mysql-driven software, this shouldn't be as difficult as it might have been before. One could compile personal lists using the "watch this page" feature; what I suggest is that we have publicly-editable and publicly-viewable lists of the same nature.
(Minor point: in a list of recent changes pages, I think there should be automatically listed the number of topics that are listed under a given subject. That'd give us an idea of how much more work there is to be done in adding to the list.)
I am not committed to any particular version of the feature, by the way. I'd just like to see it done. I don't want to have to wade through 5000 edits just to see all the recent philosophy edits.
Larry
[Wikipedia-l] To manage your subscription to this list, please go here: http://www.nupedia.com/mailman/listinfo/wikipedia-l
I have some ideas on this:
The problem of finding subject groups is closely related to the problem of indexing.
The problem with the current link structure is that it is much too dense: you can get between articles very easily, as Wikipedia is a very "small world" network. This tends to defeat any attempts at automated indexing. What is needed is a way of making some pages and links more visible than others to automatic indexing systems.
We define a new "category" namespace. An article can contain any number of "category" links, which do not appear in the main body of the article, but instead in a separate area, like the inter-language links. There, they link to a placeholder "category" page which can be used to define and describe the category. (And its "category talk" page can be used to discuss the category).
Now, all pages that link to a category page "belong" to that category. Categories are just sets: there can be any number of them, and they can belong to multiple competing schemes. And categories can belong to categories, too, allowing for hierarchies and networks of categories to be created. The presence of categories will make machine indexing much easier.
Some very basic categories that might be useful for a start:
*[[category:animal]] eg wolf, cat, bird, dinosaur *[[category:vegetable]] eg potato, cactus *[[category:person]] eg Isaac Newton, Sherlock Holmes (but see below) *[[category:time period]] eg 20th century, Feburary, 200 BC, Cenozoic *[[category:event]] eg Wars of the Roses, 1997 Academy Awards *[[category:place]] eg Dubrovnik, Alaska, Indian Ocean, Atlantis (but see below) *[[category:field of study]] eg Biology, Chemistry, Philosophy, Law, Accountancy, Civil engineering ** not sure of the best name(s) for this: field of endeavour, subject of inquiry? *[[category:fictional]] eg Sherlock Holmes, Atlantis *[[category:abstraction]] eg Soul, Mind, Sophie Germain prime, Mathematical set
I'd like to get these very simple categories in place first, as a sort of "page coloring" experiment. Notice that they are neither complete, nor framed in the form of a hierarchy: this is not a taxonomy. For example, "prion" belongs to none of these categories. Perhaps someone will create a [[category:other lifeform]] page for the Archaea, prions and viruses.
How to bootstrap the process? My first idea is this:
* assign categories to about 1000 articles by hand * train a naive Bayesian classifier to recognise each category * adjust thresholds to make sure that classifications are reasonably accurate * machine-classify the entire Wikipedia!
Now, this process will be less than perfect. Some articles will be mis-classified, others will be missed because the threshold probabilities were set too high: ie both type I and type II errors. Mis-classification will not damage any actual articles, it will only result in errors in machine-generated indexes. But at this point, manual editing will take over.
New articles can be machine-classified once they reach say 250 characters, using a Bayesian classifier that is trained on the corpus as a whole: and again, once they have been machine-classified once, they are then left alone thereafter.
Now, at this point, we may not need to create a "philosophy" or "chemistry" category. Instead, we can just note that these are [[category:field of study|fields of study]] and hence that pages that link to them, or are linked from them, "belong" to them in some sense. Similar treatment can be done for time periods.
I'm not 100% sure how this would work, but I think that a workable mechanism could be evolved, given the initial category coloring.
Neil