Hi!
Is it possible that there is some weird debugging code showing up when
searching for something like
http://www.wikipedia.com/wiki.phtml?search=%22football%22
or anything else containing quotes or backslashes?
(it shows "grrrgrrr" on top of my browser window within the cologne blue
skin.)
Also, searches for one backslash appears as two backslashes in the
search form after submission. Probably striplslashes() should be applied
to the text shown in the html search form.
Cheers,
Marian
--
Marian Steinbach
http://www.ds.fh-koeln.de/~marian/
ICQ# 9790691
Lee, it would be good if the response times you posted on wikipedia-l
could be annotated with the exact time when the requests arrived at
the server (should be contained in the server's log file).
Maybe the server slows down when too many requests arrive in a short
time interval, or in a short time interval following some special
request. Also, we need to know in chronological order which requests
preceeded the slow ones; I don't think your list contains that
information.
Axel
If anyone's interested, here's a little spider program in Python for
exercising the Wikipedia test site.
Tested in Python 2.2.1 for Win32, although it should run on other OS's
just fine. A few instances of this
running at once should give the test site a good stress test. Notice
that it will go places where proper spiders dare not tread --
intentionally.
---------------------------------------------------------------------------------------------------------------------------------
# Stressbot: a stress tester for the Wikipedia test site
# * Makes GET requests aggressively at random
# * Ignores embedded robots metadata entirely
# * Won't do POSTs, so shouldn't corrupt the database
# * Won't stray outside the site
import urllib, re, string, random
def proc_href(str):
str = re.sub(r'<[Aa][^>]*?[hH][rR][eR][fF] *= *"([^"]*)"[^>]*?>',
r'\1', str)
# handle entities buried in the HREF
return string.replace(str, "&", "&")
def get_hrefs(text):
hrefs = re.findall(r'<[Aa][^>]*?[hH][rR][eR][fF] *=
*"[^"]*"[^>]*?>', text)
return map(proc_href, hrefs)
home_url = "http://130.94.122.197/"
history = []
url = home_url
while 1:
print 'opening', url
text = urllib.urlopen(url).read()
# Make a note in a limited-length history
# This is to stop endless revisiting of the standard special pages
history.append(url)
history = history[-50:]
# Parse out all the A HREFs
url_list = get_hrefs(text)
# Limit to the local site
url_list = filter(lambda u: u[:len(home_url)] == home_url, url_list)
# Don't revisit a page we have been to recently
url_list = filter(lambda u: u not in history, url_list)
print len(text), 'bytes', len(url_list), 'non-recent local A HREFs'
# The home page is the last resort, and we should also force it
occasionally
# to get us out of pathological dead-end subspaces
if not url_list or random.choice(range(10)) == 0:
url_list.append(home_url)
url = random.choice(url_list)
I got all the conversion scripts working well now, and the runtime
down to under 3 hours (it was over 12 when I started!), so I think a
smooth transition will be possible. The code has been running on the
new server for couple of days now and seems fast, but then it has no
load. If someone out there has a good way to stress test it, let me
know.
We've had a great round of QA--lots of testing this last week, and
while I've kept the code semi-frozen I did fix some things that would
have been problems for the live installation, and I've got a good
list of feature requests to work on (though I suspect the first week
or two after installation will be devoted to performance tuning the
server settings with timing data we get from live operation).
Special thanks to Magnus, who did a lot of testing and found some
good stuff.
I still don't have all the images--I'd really like to get all of
those over and installed and tested before I do anything else, but
assuming that goes well, below is my initial idea for a transition
plan. Give me some feedback on it:
1. Set the DNS time-to-live on www.wikipedia.com to some fairly short
period, like a few hours, soon.
2. Publish the address of the new server for people to test for
another day or two (but not until after I get those images!)
These next steps happen at a chosen low-traffic time of day,
and have to be coordinated well among all of us so they can
be done quickly:
3. Replace the main page of the new server with a "Future home of..."
message, telling users that the wikipedia site will be here in a few
hours after moving servers.
4. Empty the new server database.
5. On the old server, disable the "upload" page, with a message that
the server is being moved.
6. Move all the images over.
7. Mark the rest of the old wiki read-only.
8. Transfer DNS to the new server. This will gradually take effect
for users over the next few hours, so they'll get either the read-
only old server, or the "come back in a few hours" message on the new
server during the move.
8. Dump the old database, copy the dumps to the new server.
9. Run the conversion scripts. This will take 2-3 hours.
10. Replace the main page on the new server with the real wiki code.
Do some sanity testing of all the fetures.
11. Replace the main page of the old server with a "We've moved"
message that points to the new server's IP address.
At that point, we should be live on the new server.
0
Can other people grant this access now, or do I still need to go into the
database directly?
----- Forwarded message from L C <lcwiki(a)hotmail.com> -----
From: "L C" <lcwiki(a)hotmail.com>
Date: Sat, 29 Jun 2002 08:33:45 +0000
To: jwales(a)bomis.com
Subject: Wikipedia Administrator Access
Jimbo Wales,
I'd like to have wikipedia administrator access, please. I've been
contributing for 11 months (my larger contributions are on user:LC), and
I've collaborated with people like Axel on a number of pages. I'll use the
administrator access carefully. Thanks,
--LC
_________________________________________________________________
Join the worlds largest e-mail service with MSN Hotmail.
http://www.hotmail.com
----- End forwarded message -----