Wikitech-l

wikitech-l@lists.wikimedia.org

26 participants
23043 discussions

Start a nNew thread

slow searches log

by Jimmy Wales

http://www.wikipedia.com/tarballs/ross-slow.log.bz2

22 years, 3 months

Stress testing setup

by Axel Boldt

I think we should get a decent stress testing setup going, so that we don't have to try out all improvements on the live site. I assume we are all running Unix/apache/php/mysql. Siege at http://joedog.org/siege/ looks like a good choice of a free stress testing utility. It can simulate n users hammering your site simultaneously, even if you aren't connected to any network, and reports response times and other stats. We need a realistically sized database dump to start playing though. We could then come up with a nice little Siege script that loads RecentChanges, searches and downloads a couple of articles, edits a couple others (maybe modeled on real stats from Jimbo), share the script and use it as benchmark to try out patches to the PHP script and database scheme. Axel

22 years, 3 months

Proof of principle

by Magnus Manske

For proof of principle (and just for fun, of course!) I took a few hours to write a simple single-pass parser for wikipedia pages in C++, using sql++ (the "official" MySQL API). It supports external and internal links (with "does topic exist" check), special line beginnings (:, *, #, and leading space), wiki-style bold and italics, ---- things etc. No == headings == (beheadings neither;), no namespaces, but these are simple to implement. Called from the shell (as I didn't connect it with the apache yet), it renders the Main Page in 0:00.04 (without caching ;) Should I continue to work on this, and eventually add it to the CVS, or is it just a waste of time? I will, of course, continue to work on the PHP script, no doubt about that! Magnus

22 years, 3 months

License

by Axel Boldt

I think we should add the GPL license to the cvs tree and add a copyright notice to the top of the scripts as soon as possible (just take any GNU program as template). Right now, the number of contributors is still small and we can still make this change. Later it may get ugly. Axel

22 years, 3 months

Slow search queries

by Axel Boldt

> SELECT * FROM cur WHERE (cur_title NOT LIKE "S") AND (cur_text LIKE "S" ) > ORDER BY cur_title >seems to be a big offender. That's the special_dosearch query, I think. Yes. That searches linearly through the whole database to search for a string. That's a killer. It's like our old UseMod search before Jimbo wrote an indexed one. A FULLTEXT index, together with MATCH instead of LIKE, should give a major improvement. Axel

22 years, 3 months

Questions about installing

by Jimmy Wales

When I check everything out of the CVS, I do it into a directory that has nothing to do with the real site. Then, I copy the files over to the proper location. It seems that wikiText.php AND wikiTextEn.php are always different, and I have to edit them... so really, I shouldn't be copying them unless there's a good reason, right? Here's my exact question: wikiTextEn.php warns me: # ATTENTION: # To fit your local settings, PLEASE edit wikiText.php ONLY! # Change settings here ONLY if they're to become global in all wikipedias! But that seems a bit "opposite" to me... doesn't wikiTextEn mean "wikiText English"? If so, then changes here should ONLY affect the English wikipedia, not "global in all wikipedias"? Also, whichever way it is supposed to be, I'm sure I should only have to edit one file. But I have to edit two. First, $wikiCurrentServer returns http://wikipedia.com in the default configuration, but we prefer http://www.wikipedia.com/ (see line 12 of wikiTextEn.php, I always edit to hardcode this.) And on the next line, $wikiSQLServer is different locallly: the database is named "wiki" instead of "wikipedia". So, should I just add those two things to wikiText.php? And that will override the stuff in wikiTextEn.php?

22 years, 3 months

slow query log

by Jimmy Wales

O.k., I've enabled this and glanced through it a bit. It's pretty interesting, even though I only have a small handful of values in it. I think what I'll do is leave it on overnight, and then gzip it for everyone to download tomorrow. (Or perhaps I'll just gzip the output of mysqldumpslow, which summarizes it a bit. Or perhaps I'll slap together a quick perl script to boil it down to the essentials for our context.) SELECT * FROM cur WHERE (cur_title NOT LIKE "S") AND (cur_text LIKE "S" ) ORDER BY cur_title seems to be a big offender. That's the special_dosearch query, I think. Also, here's a fun looking query: SELECT * FROM cur WHERE (cur_title NOT LIKE "S") AND ( (cur_title LIKE "S" OR cur_title LIKE "S") AND (cur_title LIKE "S" OR cur_title LIKE "S") AND (cur_titl e LIKE "S" OR cur_title LIKE "S") AND (cur_title LIKE "S" OR cur_title LIKE "S") AND (cur_title LIKE "S" OR cur_title LIKE "S") ) ORDER BY cur_title I'm no database expert, but that doesn't look so sensible to me. :-)

22 years, 3 months

Old cache on Main Page

by Magnus Manske

I'm not sure if this will be fixed with the next bugfix, but I edited the Main Page (twice) to remove the counter, but I still get the old cached page ("23.338 articles").

22 years, 3 months

Speedup suggestions

by Axel Boldt

Browsing around on http://www.mysql.com/documentation/ and comparing with the wikipedia.sql in cvs, I have the following suggestions: 1) To speed up searches, we should use a FULLTEXT index on title and text, and then use the match operator. That should also yield more relevant results. (http://www.mysql.com/doc/F/u/Fulltext_Search.html) 2) In special_recentchanges.php, we select with "WHERE cur_timestamp>$mindate", but cur_timestamp is not indexed. This means that mysql linearly searches through the whole cur database, everytime somebody views RecentChanges. 3) Assuming that php runs as an apache module, we should use persistent database connections. That way, we won't repeatedly send over the username and password; one connection is reused by apache even after the php script dies. (http://www.phpbuilder.com/manual/features.persistent-connections.php and http://www.phpbuilder.com/manual/function.mysql-pconnect.php) Axel

22 years, 3 months

Re: [Wikitech-l] Re: [Wikipedia-l] Plans for RecentChanges page; programmers only!

by Brion L. VIBBER

Jan Hidders wrote: >From: "Brion L. VIBBER" <brion(a)pobox.com> > >>Right. Hmm, can you use TO_DAYS(cur_timestamp) or some such? Or is that >>just going to cause problems? >> > >Unfortunately MySQL only allows column names in the GROUP BY clause. > D'oh! Well then, two columns it is. -- brion vibber (brion @ pobox.com)

22 years, 3 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l