From: "Lars Aronsson" <lars(a)aronsson.se>
> How do you tell which indexes are superfluous?
Right now with my eyes. The indexes I removed were defined twice for the
same column. :-)
> What tools or commands
> do you use?
The EXPLAIN command tells how a query is executed, what indexes are used, et
-- Jan Hidders
During the week this week, I hope to get a few things setup.
Advice would be much appreciated, of course!
1. beta.wikipedia.com (or similar name) will be our testbed site.
Every week, a cron will dump the live database from www.wikipedia.com
and load it into the testbed site. So any changes made to the testbed
site's data will be lost every week.
2. beta.wikipedia.com will have the latest version of the code once
per day, loaded directly from the CVS via cron. This means that if
someone checks in broken code, beta.wikipedia.com will be broken until I
manually fix it.
3. The main site will be updated with the most recent working code,
let's say once per week, maybe on Monday morning. I think I should
always do this one by hand, because of the importance of not breaking
anything on the live site, or at least the importance of immediately
backing down to the previous version if something is broken.
4. Of course I can also always do an upgrade of the live site in case
we find a major bug in the running code.
Last week, I upgraded the site 5 times I think -- i.e. once per day
when I had the time. It seems important to me that we start to have a
development site and a live site, and some partially automated system for
moving forward. :-)
Dear fellow programmers,
I have just submitted my new implementation of the Recent Changes and
History pages. For these pages the number of database accesses is now
drastically reduced. I have also updated the Wikipedia.sql file with the new
database scheme. Fortunately it turned out that I didn't have to add any
columns, but I did add two new indexes. The code to add these on the running
database is in updSchema.sql. I suggest that any future changes on the
database schema are also added in the form of SQL statements that can be
executed by Jimbo.
Finally, I got some errors on the pages of old versions that occurred
because the timestamp field of the page object wasn't set. (It is used in
the footer.) I could only remedy this by adding an extra line in WikPage.php
that initializes it, but I still don't understand fully why this didn't lead
to errors before.
Anyway, I hope you guys have some time to test it, because I couldn't do
this as well as I wanted. But this is my first big patch, so please be
-- Jan Hidders
PS. My next task will the improvement of the search pages, as Axel already
There are some problems with the login routine that I wasn't able to fix yet
(obviously, otherwise I wouldn't waste your time;)
Symptoms are as follows:
1. User "Some One" can/has to log in as "Some_One"
2. Sometimes, directly after login, the first page shows the IP instead of
the user name, even though "Some One" is logged in. It will work on further
3. Sometimes, the user name changes to "Some_One1"
4. There seem to be some problems with user_rights associated with the
Personally, I have experienced only #2 on my local copy. I though this would
happen when the loading of the page is faster than the cookies management.
I will make this my priority for today, and if you're not too busy fixing
parser bugs;) please have a look at it as well.
I think we should get a decent stress testing setup going, so that we
don't have to try out all improvements on the live site.
I assume we are all running Unix/apache/php/mysql. Siege at
http://joedog.org/siege/ looks like a good choice of a free stress
testing utility. It can simulate n users hammering your site
simultaneously, even if you aren't connected to any network, and
reports response times and other stats.
We need a realistically sized database dump to start playing though.
We could then come up with a nice little Siege script that loads
RecentChanges, searches and downloads a couple of articles, edits a
couple others (maybe modeled on real stats from Jimbo), share the
script and use it as benchmark to try out patches to the PHP script
and database scheme.
For proof of principle (and just for fun, of course!) I took a few hours to
write a simple single-pass parser for wikipedia pages in C++, using sql++
(the "official" MySQL API).
It supports external and internal links (with "does topic exist" check),
special line beginnings (:, *, #, and leading space), wiki-style bold and
italics, ---- things etc.
No == headings == (beheadings neither;), no namespaces, but these are simple
Called from the shell (as I didn't connect it with the apache yet), it
renders the Main Page in 0:00.04 (without caching ;)
Should I continue to work on this, and eventually add it to the CVS, or is
it just a waste of time?
I will, of course, continue to work on the PHP script, no doubt about that!