Wikitech-l February 2003

wikitech-l@lists.wikimedia.org

54 participants
118 discussions

Start a nNew thread

www.wikipedia.pl
by Tomasz Wegrzanowski 25 Feb '03

25 Feb '03

So when Polish Wikipedia will be moved to www.wikipedia.pl ?

3 8

and not
by Andre Engels 25 Feb '03

25 Feb '03

In the past, I found it useful to be able to use "and not" in my search (in particular to filter out the Rambot pages when looking for years). I recently found that it does not work - or more precisely, that it does not work on the English Wikipedia. It does work on the others (checked only Dutch and German). What is going on, and why? Andre Engels

2 1

Thinking about Phase IV
by Lee Daniel Crocker 24 Feb '03

24 Feb '03

After getting back into wikiland catching up with wikipedia-l was pretty easy, but catching up with the wikitech list took a little longer. It seems you guys have had interesting times lately (in the Chinese curse sense). Sorry I abandoned you, but you guys do seem to have risen to the challenge. Magnus did a great service by giving us code with features that made Wikipedia usable and popular. When that code bogged down to the point where the wiki became nearly unusable, there wasn't much time to sit down and properly architect and develop a solution, so I just reorganized the existing architecture for better performance and hacked all the code. This got us over the immediate crisis, but now my code is bogging down, and we are having to remove useful features to keep performance up. I think it's time for Phase IV. We need to sit down and design an architecture that will allow us to grow without constantly putting out fires, and that can become a stable base for a fast, reliable Wikipedia in years to come. I'm now available and equipped to help in this, but I thought I'd start out by asking a few questions here and making a few suggestions. * Question 1: How much time do we have? Can we estimate how long we'll be able to limp along with the current code, adding performance hacks and hardware to keep us going? If it's a year, that will give us certain opportunities and guide some choices; if it's only a month or two, that will constrain a lot of those choices. * Suggestion 1: The test suite. I think the most critical piece of code to develop right now is a comprehensive test suite. This will enable lots of things. For example, if we have a performance question, I can set up one set of wiki code on my test server, run the suite to get timing data, tweak the code, then run the suite again to get new timing. The success of the suite will tell us if anything broke, and timing will tell us if we're on the right track. This will be useful even during the limp-along with current code phase. I have a three-machine network at home, with one machine I plan to dedicate 100% to wiki code testing, and my test server in San Antonio that we can use. This will also allow us to safely refactor code. I'd like to use something like Latka for the suite (see http://jakarta.apache.org/commons/latka/index.html ). * Question 2: How wedded are we to the current tools? Apache/MySQL/PHP seems a good combo, and it probably would be possible to scale them up further, but there certainly are other options. Also, are we willing to take chances on semi-production quality versions like Apache 2.X and MySQL 4.X? I'd even like to revisit the decision of using a database at all. After all, a good file system like ReiserFS (or to a lesser extent, ext3) is itself a pretty well-optimized database for storing pieces of free-form text, and there are good tools available for text indexing, etc. Plus it's easier to maintain and port. * Suggestion 2: Use the current code for testing features. In re-architecting the codebase, we will almost certainly come to points where we think a minor feature change will make a big performance difference that won't hurt usability, or just features that we want to implement anyway. For example, we could probably make it easier to cache page requests if we made most of the article content HTML not dependent on skin by tagging elements well and using CSS appropriately. Also, we probably want to eventually render valid XHTML. I propose that while we are building the phase IV code, we add little features like this to the existing code to guage things like user reactions and visual impact. Other suggestions/questions/answers humbly requested (including "Are you nuts? Let's stick with Phase III!" if you have that opinion). -- Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/> "All inventions or works of authorship original to me, herein and past, are placed irrevocably in the public domain, and may be used or modified for any purpose, without permission, attribution, or notification."--LDC

6 11

Citation of versions by timestamp?
by Brion Vibber 24 Feb '03

24 Feb '03

A wikipedian has recently been trying to find a good way to cite particular revisions of articles in the bibliography for a paper. Current we can give URLs for the _current_ version of an article (current as of whenever it is visited), or of _previous_ versions (as of when the citation was made): current: http://www.wikipedia.org/wiki/Foobar old: http://www.wikipedia.org/w/wiki.phtml?title=Foobar&oldid=12345 There are two main problems with this (aside from the ugliness of the old-reference URLs): * There is no way to reference the current version _as of the time of citation_. Since that revision isn't in the old table, it has no oldid assigned yet. * oldid values sometimes can change, as when an article is deleted and subsequently restored (done also when recombining histories of articles that have been broken by crude renaming). Possible rearrangements of the database (such as combining all languages into a single table) could require reassigning oldids en masse. They are *not* reliable long-term citations. One possible solution would be to provide a way of citing articles as of a particular timestamp, for instance: http://www.wikipedia.org/wiki/Foobar?version=20030224161134 which would pull up either a cur or old version with that timestamp. (It could also be prettified: version=2003-02-24-16:11:34 etc) Advantages: * consistent, no fuss, no worries about rearrangement of db structure * citation URL can be provided in a nice handy link at the bottom of every page Disadvantages: * timestamp has 1-second resolution. Generally this is going to be unique (at least per article), but it may occasionally not be, particularly in cases of recombined histories. Some articles had multiple revisions' timestamps set to the same time due to bugs in the rename code and other db tweaks in early '02. * for this reason it's not suitable as the mainline url for drawing up old history revisions via the history list; so people have to remember to find and use the citation url separately Alternatively, we could supply _both_ timestamp and oldid in the URL, and let timestamp have priority if an exact match on both is not found. Thoughts? -- brion vibber (brion @ pobox.com)

5 4

Load shedding
by Neil Harris 24 Feb '03

24 Feb '03

Observing the Wikipedia moving up to a very high level of load today, (loadav 10.39, 12.58, 13.45) it occurs to me that a "load shedding" function would be useful, where requests may be bounced with an error 502: "*Service Temporarily Overloaded". This would have the effect of dropping load until the server returns to normal load levels, preventing congestion collapse. A human-readable text should be added in the user's own language, saying something like: Wikipedia is experiencing very high load at the moment. We are taking measures to control the load of the system. Please try your request again in a few minutes when load should be lower". Well-behaved spiders should any pages sent with 502 errors. To prevent a sudden turn-on of error 502 for all users, with the possibility of load oscillation, we could make the 502 errors progressively more probable as the load increases beyond a certain point. This could also be used to ensure that logged-on users and "important" transactions suge as page edits maintain a higher QoS during these periods, until load reaches the point at which even they have to be bounced. -- Neil *

3 2

Re: [Wikipedia-l] Re: Bandwidth and anon ips
by Neil Harris 21 Feb '03

21 Feb '03

Brion Vibber wrote: >On dim, 2003-02-16 at 20:11, s wrote: > > >>Would it make sense, as I've been experiencing bandwidth limitations, on >>occasion, (assuming the problem may be general) to limit, by some ratio, the >>server request speed to anon users, and thereby allowing logged in users >>some degree of greater access? >> >> > >I'm afraid there's not much we can do about your bandwidth limitations; >that's between you and your ISP. > >(No, actually there is -- we could compress sent pages. I'll consider >trying this in the future, once apache/php are reinstalled with gzip >support built in. However this will use some cpu power, and I don't know >how this will affect server speed at this point.) > >-- brion vibber (brion @ pobox.com) > Both CPUs are typically running about 75% busy at the moment. (50% user, 25% system), and quite often intermittently peaking to 100%. However, system performance is nice and smooth nearly all the time, so the worst database bottlenecks appear to have been ironed out. When the system moves to separate machines for the database and webserver, the load should fall dramatically: that might be the right time to enable gzipping. Neil

1 0

LJ co-author?
by Jimmy Wales 20 Feb '03

20 Feb '03

Don Marti, editor of Linux Journal, contacted me about doing an article on wikipedia for LJ. I wrote back with a couple of proposals, and they chose the one described here: > Another article might be a technical article about how we're handling > our growth using all open source software, and the specific software > challenges of an open collaborative writing project. We could use > more developers, and obviously an interesting article about our > software might bring us some wonderful new talent. > > I'm the guy to write the first article, but I'm not our best technical > guy, so I'd want to co-author the second article with one of our > developers, which might take a bit longer. Would someone like to volunteer to co-author the second article? It probably makes the most sense to have someone do this who has really dirty hands from the code. My guess is that an LJ publication is a nice resume enhancer. --Jimbo

1 0

LJ article co-author?
by Jimmy Wales 20 Feb '03

20 Feb '03

1 0

INSTALL Docs lacking CVS How-to
by Leonard Tulipan 20 Feb '03

20 Feb '03

Hi there! I am fairly new to wiki's, but I like them :-) I am also quite new to CVS, so I had a rough 30 Minutes until I figured out/googled the correct command to get the current wikipedia codebase. Could someone with write permission please change, http://wikipedia.sourceforge.net/ to point to the INSTALL-Document in CVS AND Update http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/wikipedia/phpwiki/newcodebas… To not just say "chekout the code in CVS", but actually put in the commands. I found them in the german wikipedia, and have since updated http://www.wikipedia.org/wiki/Wikipedia:PHP_script (the english Version of that page) I hope I didn't do something overly stupid. Thanks and keep up the good work Ciao Leo

1 0

hack for disambiguation/topic articles
by jakob.voss＠s1999.tu-chemnitz.de 19 Feb '03

19 Feb '03

hi there! Articles names of special topics often contain the topic in parenthesis for disambiguation. The "pipe trick" automatically hides the stuff in parentheses but if I have a lot of links between pages of the same topic I still have to type the stuff in the parenthesis :-( For instance in an article about "tree (graph theory)": A '''tree''' is a [[graph (graph theory)|]] that is [[connected (Graph theory)|]] and has no simple [[cycle (Graph theory)|]]s. I'd like a little hack that automatically adds the same topic that is used in the article title. This could be done via empty parenthesis because there is no article that has "()" in its title. In the example above: A '''tree''' is a [[graph()|]] that is [[connected()|]] and has no simple [[cycle()|]]s. For multiple articles of one topic this would be really helpfull and prevents adding links to disambiguation pages. What do you think? thanks a lot, Jakob Voß

3 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2003