Wikitech-l September 2002

wikitech-l@lists.wikimedia.org

29 participants
47 discussions

by Axel Boldt

I think that a subject classification of articles would vastly improve "soft security" and would save regulars a lot of time, since not everyone would have to check every edit as currently seems to be the case. >I'd still like to see if we couldn't build those subjects >automatically in some way based on links in the database. How about this: the possible topics coincide with the major pages listed on [[Main Page]] (from "Astronomy" to "Visual Arts"). The shortest link path from such a topic page to an article defines that article's topic. If there is no such path, then the article is classified as a topic orphan. To compute these topics quickly, the cur table gets two new columns: topic and distance, where distance stands for the link distance from the Main Page topic page. If a new article is created, looking at the distance entries of all articles that link to the new one, and taking the minimum, immediately classifies the new one. If an existing article is saved, the topic and distance entries of all articles it links to (and their children) may need to be updated; these changes can be propagated in a recursive manner. Would that work? Axel

21 years, 5 months

View and restore deleted pages

by Brion VIBBER

I've added a (presently for sysops only) Special:Undelete, which should appear as "View and restore deleted pages" in the special pages list. It lists the archived deleted pages (the majority of which are pure drivel and should be flushed at some point), and you can view the archived pages and their histories and optionally restore them to life. Restoring a page with the same title as one that currently exists will tack the deleted page's history onto the end of the existing one's. The interface is still rather rough and not yet fully integrated with the deletion log (ie, links to undelete perhaps should appear next to deletion notices, and restorations should definitely be listed in the same log), and I probably haven't tested it as thoroughly as I ought to have. It is to be considered experimental, so please be gentle with it. If Lee wants to set it up on the piclab server, less gentle testing is I'm sure welcome there. :) I'd offer my own test server, but it's currently behind a modem (wah!) (File is in CVS.) -- brion vibber (brion @ pobox.com)

21 years, 7 months

Recent Changes extension

by The Cunctator

Can this be added to the User Preferences panel? Note; this is just a texty version. Default Recent Changes page Number of titles: [100] Number of hours: [1|12|24|72] Show minor edits: [x] Namespaces to display: All [x] Main: (encyclopedia entries) [x] +Talk [x] Wikipedia: (pages about the site) [x] +Talk [x] The hidden motive is to prep for Meta: List: or other languages, etc.

21 years, 7 months

DISTINCT missing in SpecialWantedpages.php

by Tomasz Wegrzanowski

Number of links to article is counted, instead of number of other articles that link to it. Difference is important, because on some pages there is enormous number of links to the same article. For example on alphabetical lists of cities in Poland, there are so many links to voivodships pages that they immediately become "most wanted". Most wanted article with these settings has value 609, without 69. Query: $sql = "SELECT bl_to, COUNT( bl_to ) as nlinks " . "FROM brokenlinks GROUP BY bl_to HAVING nlinks > 1 " . "ORDER BY nlinks DESC LIMIT {$offset}, {$limit}"; Should be changed to something like that: $sql = "SELECT bl_to, COUNT( DISTINCT bl_from ) as nlinks " . "FROM brokenlinks GROUP BY bl_to HAVING nlinks > 1 " . "ORDER BY nlinks DESC LIMIT {$offset}, {$limit}";

21 years, 7 months

Re: [Wikitech-l] Fulltext Stoplist

by lcrocker＠nupedia.com

> So, why not make it "wikipedia:Fulltext Stoplist", and load the > whole list from the database on each query? Might actually save > some time in the long run, and the non-English wikipedias could > easily develop their own lists. Or would that be too risky? The stoplist is compiled into MySQL; it can't be changed without recompiling the database software.

21 years, 7 months

Fulltext Stoplist

by Magnus Manske

The CVS contains FulltextStoplist.php, which is a list of the "common words" excluded from search queries. It contains only English words, which caused complaints on the German wikipedia, as they, at least, don't want to be kept from searching for "false friends" common in English. It would be easy to just make it another array/function in the Language files, but 1. AFAIK, it is only used in one function, namely search 2. It might be nice if updating this list would be easy for everyone, not just developers So, why not make it "wikipedia:Fulltext Stoplist", and load the whold list from the database on each query? Might actually save some time in the long run, and the non-English wikipedias could easily develop their own lists. Or would that be too risky? Magnus

21 years, 7 months

[Stephen.Thornton@mic.ul.ie: Popper Entry]

by Larry Sanger

I think the Nupedia software is written so that articles in a subject area are not displayed unless a subject is "active." So, none of the philosophy articles are displayed, because there is no philosophy editor and therefore the subject area is "inactive." This is just a bug, not a requested feature. Whether it can and will be fixed--don't know! Of course, if we were to implement the Nupedia system agreed upon last November, or some other simpler system, presumably this sort of problem wouldn't exist... --Larry

21 years, 7 months

Re: [Wikipedia-l] Feature request

by Larry Sanger

(Wikitech-l: this is more on automatic subject classification, which Axel brought up recently on Wikipedia-l.) On Mon, 23 Sep 2002, Axel Boldt wrote: [snip excellent comments that I agree with] > I still believe that all of this can and should be > done automatically, by tracing link paths from the > main page. I'm going to repeat some of what you've said earlier, adding my own perspective. I really hope some programmers pursue this--they needn't ask anyone's permission. The proof's in the pudding. If automatic categorization could be done, and it sounds very plausible to me, it would be *far* superior to a hand-maintained list of subject area links. And incredibly useful, too. OK, the following will reiterate some of the earlier discussion. Presumably, nearly every page on Wikipedia can be reached from nearly every other page. (There are orphans; and there are pages that do not link to any other pages, though other pages link to them.) This suggests that we can basically assign a number--using *some* algorithm (not necessarily any one in particular: here is where programmers can be creative)--giving the "closeness" of a page to all the listed subjects. (This is very much like the Kevin Bacon game, of course, and the "six degrees of separation" phenomenon.) The question whether a *useful* algorithm can be stated is interesting from a theoretical point of view. As I understand it, the suggestion is that there is a simple and reliable (but how reliable?) algorithm, such that, given simply a list of all the links in Wikipedia (viz., the source page and destination page), and a list of subject categories, we can reliably sort all pages into their proper categories. It will not do to say, "There are obvious counterexamples, so let's not even try." We can live with some slop. This is Wikipedia! We could even fix errors by hand (ad hoc corrections are possible; why not?). As far as I'm concerned, the real question is, once we try *various* algorithms, what's the highest reliability we can actually generate? I'll bet it'll be reasonably high, certainly high enough to be quite useful. Here's an attempt at expressing an algorithm: For a given page P (e.g., [[Plato's allegory of the cave]]), if the average number of clicks (not backtracking to any page already reached-- otherwise you deal with infinite regresses) needed to reach P from the subject page S (e.g., [[Philosophy]]) through all possible links between P and S (or, perhaps, all possible links below a certain benchmark number?) is lower than the average number of clicks need to reach P from any other subject page, then P is "about" S. The algorithm could be augmented in useful ways. In case of ties, or near ties, a page could be listed as under multiple subjects. I have no idea if this algorithm is correct, but that doesn't matter--it's just an example. If you think harder and longer, I'm sure you'll think of a better one. This would be fascinating, I'm sure, for the programmers. Can't we just take the question about how long processing will require as a constraint on the algorithm rather than as a knock-down argument that it's not feasible? The *exercise* is to find (and implement!) an algorithm that *is* feasible. We don't even have to do this using Wikipedia's server, if it would be too great of a load; anyone could download the tarball and process it. You could do a cron job once a day, compile the 40-odd "subject numbers" for each article in Wikipedia, and sort articles into subject groups (in some cases, multiple subject groups for a given article--why not?). From there we could use scripts already written to create the many "recent changes" pages. I really, really, really want to see [[Philosophy Recent Changes]]. We desperately need pages like that, and this is one of the best possible ways we have of getting them. It's worth actually exploring. --Larry

21 years, 7 months

[canuck_in_korea2002@yahoo.com: [Wikipedia-l] Fwd: Adding Wikipedia to OneLook]

by Jimmy Wales

I'm forwarding this to wikitech-l. I've been wanting to do a dump of our article titles to insert into the search engines that I manage (bomis and 3apes, mainly), just to drive more of the traffic that I influence towards wikipedia. I did a little program for this in the old UseMod days, but Bomis hasn't updated it's wikipedia links since then. :-( Perhaps we should have a script to generate RDF, which is a simple format used by dmoz and familiar to search engine operators. --Jimbo ----- Forwarded message from Stephen Gilbert <canuck_in_korea2002(a)yahoo.com> ----- From: Stephen Gilbert <canuck_in_korea2002(a)yahoo.com> Date: Sat, 21 Sep 2002 09:29:48 -0700 (PDT) To: wikipedia-l(a)nupedia.com Subject: [Wikipedia-l] Fwd: Adding Wikipedia to OneLook I just received this from the maintainer of OneLook, a meta-search of dictionaries (and some encyclopedias). Is there any way to produce a flat file of all our articles? This would also allow Wikipedia to once again work with Sunir Shah's MetaWiki search. If you haven't tried OneLook before, give it a spin. It's fantastic. http://www.onelook.com/ Stephen G. --- Doug Beeferman <doug(a)dougb.com> wrote: > Date: Fri, 20 Sep 2002 13:16:10 -0400 (EDT) > From: Doug Beeferman <doug(a)dougb.com> > To: <canuck_in_korea2002(a)yahoo.com> > > > Hi Stephen, > > Thanks for your kind feedback on OneLook and sorry > for the delay in > responding to you. I already have some familiarity > with Wikipedia/Nupedia > as a user and would love to add its headwords to > OneLook. Could you > assist me in this? In particular, do you know of a > flat file that I could > point OneLook's update engine to that lists > Wikipedia's headwords (or > topic names, or however they're called in > Wiki-land)? > > Running the available SQL files through MySQL to > dump these headwords on > every update would be a bit unwieldy for various > reasons. One alternative > would be a script that extracts the headwords from > the .sql file, but I > don't want to rewrite this if it's already been > done. > > (I just spent fifteen minutes trying to figure out > how to post this > question on meta.wikipedia.com. I was defeated. > There's probaby > something I'm missing or too lazy to read...) > > Thanks again for writing. Are you involved actively > in Wikipedia's > maintenance? If so, good luck with the project -- > it looks like it's > really taking off! > > Doug > (a canuck in America) > > > --- add canuck_in_korea2002(a)yahoo.com > 2002-09-06 08:51:25 218.150.177.35 Mozilla/5.0 > Galeon/1.2.5 (X11; Linux i686; U;) Gecko/20020610 > Debian/1.2.5-1 > > I just discovered One-Look and it has quickly found > a much coveted space > on my browser's Personal Toolbar. I especially > appreciate the lack of > pop-up ads. Thanks for the great resource! > > > I notice you have several encyclopedias in your > database. I would like to > suggest an encyclopedia called Wikipedia. It is an > effort to build a > complete encyclopedia from scratch, written by > volunteers and released > under the GNU Free Documentation License. Wikipedia > is one and a half > years old, and currently has about 40,000 articles, > many of which are very > good. > > Cheers! > > Stephen Gilbert > > __________________________________________________ Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! http://sbc.yahoo.com [Wikipedia-l] To manage your subscription to this list, please go here: http://www.nupedia.com/mailman/listinfo/wikipedia-l ----- End forwarded message -----

21 years, 7 months

Why URLs that change content when loaded are bad

by Brion VIBBER

Recently the image Great_Seal_of_the_United_States_(small).png vanished; its image description page listed it as present and had a link to one revision, but the image itself was gone. After brief investigation, I found that the image file was still present in the archives directory, and copied it back to where it belonged... but there had also once been an earlier revision of the same file (a non-transparent PNG), now missing both from the archive and the image page. Grepping the access logs, it turned out that a spider had come across the image description page and followed links to both revert and delete the older revision of the image -- simply loading up these links caused the wiki to move and permanently delete files. Apparently telling it to both revert and delete the same revision confused the poor wiki, and it ended up vanishing that revision entirely _and_ leaving the newer one only in the archives. As a workaround until a better way of handling these functions is decided on, I've hacked Skin.php to not give the delete/revert links to anonymous users (and therefore bots and spiders). -- brion vibber (brion @ pobox.com)

21 years, 7 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l September 2002