In the German wikipedia a list of the used qualifiers in the titles was
discussed, and most of the participants think it will be a interesting feature.
I know I have expressed it wrong, therefore an example:
Cell (biology) is a homonym (cell) with a qualifier (biology). To get a
proper list of those qualifiers and modify or eliminate wrong ones, a list
would be very helpful.
The discussion result was that such a unique list of those qualifiers from
titles (table cur) and bl_to (brokenlinks) would make sense.
Unfortunally I can give you only a proper PostgerSQL select statement (only
table cur) but possibly someone can transfer this easy to mySQL:
======== WARNING: THIS IS NOT A VALID STATEMENT FOR WIKIPEDIA ==========
SELECT DISTINCT substring(cur_title FROM '.+\\((.+)\\)') AS p FROM cur;
========================= I WARNED YOU =================================
-- "\\(" ==> ( (needed for quoting)
-- (.+) ==> the () - construct is used to select the part substring will
For the titles "foo", "foo (bar)", "foo2 (bar)" and "bar (foo)" the result
will be "bar" and "foo".
This should only show the as-is state of these qualifiers! There is no
intention for any automated process to enforce them, because no Wikipedian
should get an error like "Qualifier not allowed" or so. This page is only
for administrational and informational purposes!
Of course additional features, like showing the matched pages and others,
would be nice, but there the discussion must go on further, IMHO.
If there are more questions, ask, I will try to answer them.
------------------------- Anthill inside! ---------------------------
Hum, I am a bit embarrassed here. But well...
We have a bottle neck problem I fear.
Utilisateur:Alvaro asked to be a sysop on the french
wiki quite a while ago, and nobody spoke against
(indeed, several spoke for him). Then, since this is
asked on our pump, he was first forgot for a while.
Had to kindly ask again.
So I put a message asking for him to be made sysop in
the database on your page Brion, on the english
wikipedia, as well as the metapedia if I remember
He still is not sysop. It is clear it is a pain for
you developpers to do this type of chore; Ed
mentionned it several times. So, we wait, and ask
again, and again.
But, then, isnot there a way for us to make people on
the international wikipedia ourselves ?
Could there be something like a query, which could
allow french sysop to make french user (just an
example of course) a sysop automatically ? Maybe a
list of user could be displayed, and one sysop could
click on the name of one user to make him sysop ? Of
course, it would be absolutely required then than a
log is clearly visible to everyone, to avoid abuse.
Is something like this possible or not ?
Meanwhile, could someone please make Alvaro sysop ?
Also, as Aoineko mentionned previously, the mispelling
page is quite broken, and has never been working well
since we are phase III (31th of october). It was
I know you are overbooked. Is there something we can
do (ok, not me, but there are some developpers among
us) to make that work ? Do the other international
wikipedias using accentuated letters do have the
problem as well ?
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
A suggestion for caching: the home page, [[Main
Page]], never should include links to nonexistent
pages. Could that be cached (in standard stylesheet),
e.g., as /index.html? I would think that page is
accessed very frequently and that caching it would
save some database work for other pages. When the page
is edited (only by a sysop), he/she would render the
page to HTML, possibly by explicitly accessing
/wiki/Main_Page, and save it as /index.html. Would
this help some server strain?
Could we also cache other pages known to link only to
existing pages, such as this week in dates ([[April
30]], etc.), [[Current events]], and [[Recent
deaths]]? Or are these pages updated too often to be
useful? Could we also cache the protected pages?
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
I notice that /usr (/dev/sda2) is at 96%. ext2 has some pretty bad
problems with fragmentation once it gets above a certain percentage.
This can cause some pretty bad performance problems. Once it has
fragmented, it is difficult to get it back to a contiguous state.
There are defrag programs, but they are fairly scary. The only other
way to get it back to normal is to back everything up, mkfs, and restore
Perhaps somebody can remove a bunch of the packages that are installed
that we don't use?
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN
Motherboard for the server we're going to use as Wikipedia front end.
I haven't heard from Jason yet as to when he can make the trip to
install it, but possibly Friday or Monday.
----- Forwarded message from Sales(a)Computers4SURE.com -----
Date: Tue, 29 Apr 2003 11:29:37 -0400
Subject: We've received your order #C030450733, Jimmy. Thank you.
Thank you for shopping with Computers4SURE.com. We would like to confirm
that your order has been received.
On 29-Apr-2003, you ordered:
Item Quantity Price
SERVERWORKS LE-T DUAL PGA370 1 $485.95
The only issue I see would be if database threads crowd out apache on
the old server, and all the foreign wiki's go down, while the English
wiki stays up. This could create new tensions with the foreign wiki's.
From: Lee Daniel Crocker [mailto:email@example.com]
Sent: Tuesday, April 29, 2003 5:13 PM
Subject: Re: [Wikitech-l] Ad-hoc changes
> (Brion Vibber <vibber(a)aludra.usc.edu>):
> [Configuration info]
Here's a thought: when we get the new server up, let's install the
English wiki cleanly from the distribution, make sure it's all happy,
then do the switch, _leaving the foreign wikis on the old server_ for a
while, then do the same thing for each of the foreign ones in turn,
updating them to the newest software, and making everything in sync,
before we shut down apache on the old server.
Any problems with that?
Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/> "All
inventions or works of authorship original to me, herein and past, are
placed irrevocably in the public domain, and may be used or modified for
any purpose, without permission, attribution, or notification."--LDC
Wikitech-l mailing list
Hi - clearly, it'd be great if Wikipedia had better performance.
I looked at some of the "Database benchmarks" postings,
but I don't see any analysis of what's causing the ACTUAL bottlenecks
on the real system (with many users & full database).
Has someone done that analysis?
I suspect you guys have considered far more options, but as a
newcomer who's just read the source code documentation, maybe
some of these ideas will be helpful:
1. Perhaps for simple reads of the current article (cur),
you could completely skip using MySQL and use the filesystem instead.
Simple encyclopedia articles could be simply stored in the
filesystem, one article per file. To avoid the huge directory problem
(which many filesystems don't handle well, though Reiser does),
you could use the terminfo trick.. create subdirectories for the
first, second, and maybe even the third characters. E.G., "Europe"
is in "wiki/E/u/r/Europe.text". The existence of a file can be used as
the link test. This may or may not be faster than MySQL, but
it's probably faster: the OS developers have been optimizing
file access for a very long time, and instead of having
userspace<->kernel<->userspace interaction, it's
userspace<->kernel interaction. You also completely avoid
locking and other joyless issues.
2. The generation of HTML from the Wiki format could be cached,
as has been discussed. It could also be sped up, e.g., by
rewriting it in flex. I suspect it'd be easy to rewrite the
translation of Wiki to HTML in flex and produce something quite fast.
My "html2wikipedia" is written in flex - it's really fast and didn't
take long to write. The real problem is, I suspect that
isn't the bottleneck.
3. You could start sending out text ASAP, instead of batching it.
Many browsers start displaying text as it's available, so to
users it might _feel_ faster. Also, holding text in-memory
may create memory pressure that forces more useful stuff out of
Anyway, I don't know if these ideas are all that helpful,
but I hope they are.
>From: Lee Daniel Crocker <lee(a)piclab.com>
>One things that would be nice is if the HTTP connection could be
>dropped immediately after sending and before those database updates.
>That's easy to do with threads in Java Servlets, but I haven't
>found any way to do it with Apache/PHP.
:P No, I looked into exactly this problem in connection with my own little
project (improved Special:Movepage). PHP and threads don't mix. As far as I
could see, the PHP subprocess has to exit (taking all threads with it)
before Apache will drop the connection. Like Brion said, you'd have to set
up another process, and use PHP's poorly documented IPC functions. As for
what improvement it would achieve: it wouldn't reduce database load per
view, it would just allow users to hit more pages sooner.
I think caching HTML is the way to go, in the short term. If people don't
want to code something complicated, you could ignore user preferences for
now and only cache pages for "anonymous" users. The cached version could
leave little notes in the HTML like
<strong>Isaac Newton</strong> was a <<WIKILINK[[physics|physicist]]>> born
<<USERIP>> (<a href ="http://www.wikipedia.org/wiki/User_talk:<<USERIP>>"
class='internal' title="User talk:<<USERIP>>">Talk</a>
Then a cache processing script would look up the link table and replace the
links with real HTML. I imagine looking up the link table is much, much
faster than looking up cur_text. Plus the cached text would be stored on the
web server, thereby distributing disk load more evenly.
As for invalidation, the easiest, and possibly ugliest way I can think of is
implementing it in wfQuery() *cringe*. That's a very simple function with
very diverse uses, but every single update query passes through that point.
Just use a hash table (always in RAM) to store the article name of every
cache entry, and remove the rows when they're invalidated.
There'd also have to be a check for an altered user talk page. This could be
handled with another of my <<TAGS>>.
This idea is likely to be met with apathy. I'd like to code it myself, but I
don't have Linux on my PC, or a broadband connection, or much free hard
drive space, or... time. So there you have it: my two cents, backed up by
-- Tim Starling.
Hotmail now available on Australian mobile phones. Go to