Dear fellow programmers,
I have just submitted my new implementation of the Recent Changes and
History pages. For these pages the number of database accesses is now
drastically reduced. I have also updated the Wikipedia.sql file with the new
database scheme. Fortunately it turned out that I didn't have to add any
columns, but I did add two new indexes. The code to add these on the running
database is in updSchema.sql. I suggest that any future changes on the
database schema are also added in the form of SQL statements that can be
executed by Jimbo.
Finally, I got some errors on the pages of old versions that occurred
because the timestamp field of the page object wasn't set. (It is used in
the footer.) I could only remedy this by adding an extra line in WikPage.php
that initializes it, but I still don't understand fully why this didn't lead
to errors before.
Anyway, I hope you guys have some time to test it, because I couldn't do
this as well as I wanted. But this is my first big patch, so please be
gentle. :-)
Kind regards,
-- Jan Hidders
PS. My next task will the improvement of the search pages, as Axel already
suggest.
From: "Axel Boldt" <axel(a)uni-paderborn.de>
>
> I think in your recent updSchema.sql, you want the second cur_timestamp
> to be old_timestamp.
I've corrected it. Thanks for noticing.
-- Jan Hidders
There are some problems with the login routine that I wasn't able to fix yet
(obviously, otherwise I wouldn't waste your time;)
Symptoms are as follows:
1. User "Some One" can/has to log in as "Some_One"
2. Sometimes, directly after login, the first page shows the IP instead of
the user name, even though "Some One" is logged in. It will work on further
pages.
3. Sometimes, the user name changes to "Some_One1"
4. There seem to be some problems with user_rights associated with the
above.
Personally, I have experienced only #2 on my local copy. I though this would
happen when the loading of the page is faster than the cookies management.
I will make this my priority for today, and if you're not too busy fixing
parser bugs;) please have a look at it as well.
Magnus
I think we should get a decent stress testing setup going, so that we
don't have to try out all improvements on the live site.
I assume we are all running Unix/apache/php/mysql. Siege at
http://joedog.org/siege/ looks like a good choice of a free stress
testing utility. It can simulate n users hammering your site
simultaneously, even if you aren't connected to any network, and
reports response times and other stats.
We need a realistically sized database dump to start playing though.
We could then come up with a nice little Siege script that loads
RecentChanges, searches and downloads a couple of articles, edits a
couple others (maybe modeled on real stats from Jimbo), share the
script and use it as benchmark to try out patches to the PHP script
and database scheme.
Axel
For proof of principle (and just for fun, of course!) I took a few hours to
write a simple single-pass parser for wikipedia pages in C++, using sql++
(the "official" MySQL API).
It supports external and internal links (with "does topic exist" check),
special line beginnings (:, *, #, and leading space), wiki-style bold and
italics, ---- things etc.
No == headings == (beheadings neither;), no namespaces, but these are simple
to implement.
Called from the shell (as I didn't connect it with the apache yet), it
renders the Main Page in 0:00.04 (without caching ;)
Should I continue to work on this, and eventually add it to the CVS, or is
it just a waste of time?
I will, of course, continue to work on the PHP script, no doubt about that!
Magnus
I think we should add the GPL license to the cvs tree and add a
copyright notice to the top of the scripts as soon as possible (just
take any GNU program as template). Right now, the number of
contributors is still small and we can still make this change. Later
it may get ugly.
Axel
> SELECT * FROM cur WHERE (cur_title NOT LIKE "S") AND (cur_text LIKE "S" )
> ORDER BY cur_title
>seems to be a big offender. That's the special_dosearch query, I think.
Yes. That searches linearly through the whole database to search for a
string. That's a killer. It's like our old UseMod search before Jimbo wrote
an indexed one. A FULLTEXT index, together with MATCH instead
of LIKE, should give a major improvement.
Axel
When I check everything out of the CVS, I do it into a directory that has
nothing to do with the real site.
Then, I copy the files over to the proper location.
It seems that wikiText.php AND wikiTextEn.php are always different, and I have
to edit them... so really, I shouldn't be copying them unless there's a good reason,
right?
Here's my exact question:
wikiTextEn.php warns me:
# ATTENTION:
# To fit your local settings, PLEASE edit wikiText.php ONLY!
# Change settings here ONLY if they're to become global in all wikipedias!
But that seems a bit "opposite" to me... doesn't wikiTextEn mean "wikiText English"?
If so, then changes here should ONLY affect the English wikipedia, not "global in
all wikipedias"?
Also, whichever way it is supposed to be, I'm sure I should only have to edit one file.
But I have to edit two.
First, $wikiCurrentServer returns http://wikipedia.com in the default configuration, but
we prefer http://www.wikipedia.com/ (see line 12 of wikiTextEn.php, I always edit to
hardcode this.)
And on the next line, $wikiSQLServer is different locallly: the database is named
"wiki" instead of "wikipedia".
So, should I just add those two things to wikiText.php? And that will override the
stuff in wikiTextEn.php?