Wikitech-l February 2002

wikitech-l@lists.wikimedia.org

10 participants
70 discussions

License
by Axel Boldt 09 Feb '02

09 Feb '02

I think we should add the GPL license to the cvs tree and add a copyright notice to the top of the scripts as soon as possible (just take any GNU program as template). Right now, the number of contributors is still small and we can still make this change. Later it may get ugly. Axel

2 1

Slow search queries
by Axel Boldt 09 Feb '02

09 Feb '02

> SELECT * FROM cur WHERE (cur_title NOT LIKE "S") AND (cur_text LIKE "S" ) > ORDER BY cur_title >seems to be a big offender. That's the special_dosearch query, I think. Yes. That searches linearly through the whole database to search for a string. That's a killer. It's like our old UseMod search before Jimbo wrote an indexed one. A FULLTEXT index, together with MATCH instead of LIKE, should give a major improvement. Axel

1 0

Questions about installing
by Jimmy Wales 09 Feb '02

09 Feb '02

When I check everything out of the CVS, I do it into a directory that has nothing to do with the real site. Then, I copy the files over to the proper location. It seems that wikiText.php AND wikiTextEn.php are always different, and I have to edit them... so really, I shouldn't be copying them unless there's a good reason, right? Here's my exact question: wikiTextEn.php warns me: # ATTENTION: # To fit your local settings, PLEASE edit wikiText.php ONLY! # Change settings here ONLY if they're to become global in all wikipedias! But that seems a bit "opposite" to me... doesn't wikiTextEn mean "wikiText English"? If so, then changes here should ONLY affect the English wikipedia, not "global in all wikipedias"? Also, whichever way it is supposed to be, I'm sure I should only have to edit one file. But I have to edit two. First, $wikiCurrentServer returns http://wikipedia.com in the default configuration, but we prefer http://www.wikipedia.com/ (see line 12 of wikiTextEn.php, I always edit to hardcode this.) And on the next line, $wikiSQLServer is different locallly: the database is named "wiki" instead of "wikipedia". So, should I just add those two things to wikiText.php? And that will override the stuff in wikiTextEn.php?

2 4

slow query log
by Jimmy Wales 09 Feb '02

09 Feb '02

O.k., I've enabled this and glanced through it a bit. It's pretty interesting, even though I only have a small handful of values in it. I think what I'll do is leave it on overnight, and then gzip it for everyone to download tomorrow. (Or perhaps I'll just gzip the output of mysqldumpslow, which summarizes it a bit. Or perhaps I'll slap together a quick perl script to boil it down to the essentials for our context.) SELECT * FROM cur WHERE (cur_title NOT LIKE "S") AND (cur_text LIKE "S" ) ORDER BY cur_title seems to be a big offender. That's the special_dosearch query, I think. Also, here's a fun looking query: SELECT * FROM cur WHERE (cur_title NOT LIKE "S") AND ( (cur_title LIKE "S" OR cur_title LIKE "S") AND (cur_title LIKE "S" OR cur_title LIKE "S") AND (cur_titl e LIKE "S" OR cur_title LIKE "S") AND (cur_title LIKE "S" OR cur_title LIKE "S") AND (cur_title LIKE "S" OR cur_title LIKE "S") ) ORDER BY cur_title I'm no database expert, but that doesn't look so sensible to me. :-)

1 0

Old cache on Main Page
by Magnus Manske 08 Feb '02

08 Feb '02

I'm not sure if this will be fixed with the next bugfix, but I edited the Main Page (twice) to remove the counter, but I still get the old cached page ("23.338 articles").

2 1

Speedup suggestions
by Axel Boldt 08 Feb '02

08 Feb '02

Browsing around on http://www.mysql.com/documentation/ and comparing with the wikipedia.sql in cvs, I have the following suggestions: 1) To speed up searches, we should use a FULLTEXT index on title and text, and then use the match operator. That should also yield more relevant results. (http://www.mysql.com/doc/F/u/Fulltext_Search.html) 2) In special_recentchanges.php, we select with "WHERE cur_timestamp>$mindate", but cur_timestamp is not indexed. This means that mysql linearly searches through the whole cur database, everytime somebody views RecentChanges. 3) Assuming that php runs as an apache module, we should use persistent database connections. That way, we won't repeatedly send over the username and password; one connection is reused by apache even after the php script dies. (http://www.phpbuilder.com/manual/features.persistent-connections.php and http://www.phpbuilder.com/manual/function.mysql-pconnect.php) Axel

4 3

Re: [Wikitech-l] Re: [Wikipedia-l] Plans for RecentChanges page; programmers only!
by Brion L. VIBBER 08 Feb '02

08 Feb '02

Jan Hidders wrote: >From: "Brion L. VIBBER" <brion(a)pobox.com> > >>Right. Hmm, can you use TO_DAYS(cur_timestamp) or some such? Or is that >>just going to cause problems? >> > >Unfortunately MySQL only allows column names in the GROUP BY clause. > D'oh! Well then, two columns it is. -- brion vibber (brion @ pobox.com)

1 0

Recent changes; flush cache
by Magnus Manske 08 Feb '02

08 Feb '02

Jan's change to the database has my blessing, as he is a (the!) database expert. Anyway, it isn't *my* baby anymore, it is *ours*;) Yeah, the blessings of modern cloning technology... As for the cache flushing mechanism Brion proposed in a private mail, the easiest way would be to link it to the cur_counter field; if "cur_counter MOD 20 == 0", the cache could be flushed, best by setting the local $cache variable to "" at the end of the load routine. 20 is just a wild guess... Finally, thanks to Jimbo for setting up this mailing list. Gee, I never caused a mailing list before ;) Magnus

1 0

Re: [Wikipedia-l] Plans for RecentChanges page; programmers only!
by Brion L. VIBBER 08 Feb '02

08 Feb '02

(moving thread to wikitech-l) Jan Hidders wrote: >From: "Brion L. VIBBER" <brion(a)pobox.com> > >>Jan Hidders wrote: >> >>>Dear fellow programmers, >>> >>>I saw that the SQL code for the Recent Changes page is rather inefficient >>>and causes a lot of database access, so I decided to improve this. >>> >However, > >>>I can only do this properly if the timestamp field in the tables is split >>> >in > >>>a day and a time field. >>> >>Can I ask how exactly that would help? I'm not much of a database guru, >>so the answer isn't obvious to me and I'm a bit curious. >> > >It allows me to do a GROUP BY on the day. That way I can take a left outer >join between the cur table and the old table and group on the combination of >cur_day and cur_title. This allows me to get all the information I need for >the page in one SQL statement. > Right. Hmm, can you use TO_DAYS(cur_timestamp) or some such? Or is that just going to cause problems? If it's more efficient to use a split timestamp, I can't come up with any objection. But, it's Magnus' baby. :) -- brion vibber (brion @ pobox.com)

1 0

New list
by Jimmy Wales 08 Feb '02

08 Feb '02

This is the new list for tech discussions for wikipedia: wikitech-l(a)nupedia.com

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2002