Re: [Wikipedia-l] Feature request

List overview All Threads
Download

newer

older

Re: [Wikipedia-l] Deletions

difficulties of a kids' wikipedia

lcrocker＠nupedia.com

23 Sep 2002 23 Sep '02

11:25 a.m.

...

Sorry if this is already in the hopper or (!) has already been done, but it seems long overdue: We need a way to compile, based on lists of links (I guess), "Recent Changes" lists for all articles about a general topic.

This has been doable since Magnus's software, and still is; the hard work is just compiling the list of links. Once you have a page of links, say, "Wikipedia:Major philosophy articles", then you can just use the "watch links" feature of the sidebar to get a list of recent changes to all pages linked from it.

Perhaps that feature could be added to, or tweaked to add filters, etc.; but there's no point in wasting the effort to do that until it has actual data to work with, and that will take people creating those link pages.

Show replies by date

Axel Boldt

23 Sep 23 Sep

9:09 p.m.

New subject: Feature request

--- lcrocker@nupedia.com wrote:

...

...
Sorry if this is already in the hopper or (!) has

already been

...
done, but it seems long overdue: We need a way to compile, based on lists of links

(I guess),

...
"Recent Changes" lists for all articles about a

general topic.

This has been doable since Magnus's software, and still is; the hard work is just compiling the list of links.

Right. Here's the problem. I could easily spend the next weekend compiling an [[Alphabetical list of mathematics articles]]. This will allow me to cut down on my time by simply doing a "Watch links" on that page every morning. Cool. I update the list by monitoring special:Newpages occasionally. Nice. Except we need the same for all other major fields. Unless you have somebody really active, these lists will become obsolete and thus useless really soon. See [[Biographical Listing]] for example. Once the list is out of date, somebody needs to spend another weekend. And you never know whether the list is out of date or not, unless you have just spent a weekend on it.

I still believe that all of this can and should be done automatically, by tracing link paths from the main page.

Axel

__________________________________________________ Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! http://sbc.yahoo.com

The Cunctator

11:15 p.m.

New subject: Feature request

On 9/23/02 11:39 AM, "Axel Boldt" axelboldt@yahoo.com wrote:

...

I still believe that all of this can and should be done automatically, by tracing link paths from the main page.

All that needs to be done is to automatically convert the links on the main page to the special Watch Links equivalents.

I hacked up an example at [[User:The Cunctator/ByTopic]]. It's pretty useful. Though from review the Watch Links page needs to have the recent changes of the page itself as well as what links to it, I think.

Larry Sanger

24 Sep 24 Sep

3:09 a.m.

New subject: Feature request

(Wikitech-l: this is more on automatic subject classification, which Axel brought up recently on Wikipedia-l.)

On Mon, 23 Sep 2002, Axel Boldt wrote:

[snip excellent comments that I agree with]

...

I still believe that all of this can and should be done automatically, by tracing link paths from the main page.

I'm going to repeat some of what you've said earlier, adding my own perspective. I really hope some programmers pursue this--they needn't ask anyone's permission. The proof's in the pudding.

If automatic categorization could be done, and it sounds very plausible to me, it would be *far* superior to a hand-maintained list of subject area links. And incredibly useful, too.

OK, the following will reiterate some of the earlier discussion.

Presumably, nearly every page on Wikipedia can be reached from nearly every other page. (There are orphans; and there are pages that do not link to any other pages, though other pages link to them.)

This suggests that we can basically assign a number--using *some* algorithm (not necessarily any one in particular: here is where programmers can be creative)--giving the "closeness" of a page to all the listed subjects. (This is very much like the Kevin Bacon game, of course, and the "six degrees of separation" phenomenon.)

The question whether a *useful* algorithm can be stated is interesting from a theoretical point of view. As I understand it, the suggestion is that there is a simple and reliable (but how reliable?) algorithm, such that, given simply a list of all the links in Wikipedia (viz., the source page and destination page), and a list of subject categories, we can reliably sort all pages into their proper categories.

It will not do to say, "There are obvious counterexamples, so let's not even try." We can live with some slop. This is Wikipedia! We could even fix errors by hand (ad hoc corrections are possible; why not?). As far as I'm concerned, the real question is, once we try *various* algorithms, what's the highest reliability we can actually generate? I'll bet it'll be reasonably high, certainly high enough to be quite useful.

Here's an attempt at expressing an algorithm:

For a given page P (e.g., [[Plato's allegory of the cave]]), if the average number of clicks (not backtracking to any page already reached-- otherwise you deal with infinite regresses) needed to reach P from the subject page S (e.g., [[Philosophy]]) through all possible links between P and S (or, perhaps, all possible links below a certain benchmark number?) is lower than the average number of clicks need to reach P from any other subject page, then P is "about" S.

The algorithm could be augmented in useful ways. In case of ties, or near ties, a page could be listed as under multiple subjects. I have no idea if this algorithm is correct, but that doesn't matter--it's just an example. If you think harder and longer, I'm sure you'll think of a better one.

This would be fascinating, I'm sure, for the programmers. Can't we just take the question about how long processing will require as a constraint on the algorithm rather than as a knock-down argument that it's not feasible? The *exercise* is to find (and implement!) an algorithm that *is* feasible. We don't even have to do this using Wikipedia's server, if it would be too great of a load; anyone could download the tarball and process it. You could do a cron job once a day, compile the 40-odd "subject numbers" for each article in Wikipedia, and sort articles into subject groups (in some cases, multiple subject groups for a given article--why not?). From there we could use scripts already written to create the many "recent changes" pages.

I really, really, really want to see [[Philosophy Recent Changes]]. We desperately need pages like that, and this is one of the best possible ways we have of getting them. It's worth actually exploring.

--Larry

7968

Age (days ago)

7968

Last active (days ago)

wikipedia-l@lists.wikimedia.org

3 comments

4 participants

tags (0)

participants (4)

Axel Boldt
Larry Sanger
lcrocker＠nupedia.com
The Cunctator