On Mon, Nov 11, 2002 at 06:38:41PM -0500, The Cunctator wrote:
I must say, my suspicion is that it's less a problem of MySQL than it is of Wikipedia's implementation.
(I'm saying this as someone who's done a lot of MySQL/Access/Oracle work, but not any Postgres work.)
I'm always suspicious when someone makes the assertion that Language or Database X is vastly inferior to Language or Database Y, especially when all of the languages and databases have been around for a lot longer than the code that uses it.
There have been some efforts to optimize the database, but not many. And LDC's code is certainly better than the original mess, but it's hardly gotten the million-eyeball treatment.
That said, I know we've got limited resources, so if you're a Postgres expert and switching to Postgres is what will motivate you to do a code revamp, then tally ho! (Which is why I'm just writing this to you, because in the end I'm not saying anything important.)
I'm not saying Postgres would result in dramatic speedups; it's about equal to MySQL in speed. But I believe the slowdown caused by locks WOULD go away. I agree with you that better indexing could speed things up, as could better code design. I have been looking at the PHP, and this is a big job. I need to finish implementing cookies in my C cgi library, then I can consider doing it.
I've been prejudiced against MySQL since the day I bought the "Practical SQL Handbook", sat down to type in the examples, and none of them worked on MySQL, and they all worked on Postgres.
The current PHP code mixes SQL in with regular code, (ugh!) and doesn't have separate HTML templates either; the HTML is also intermingled with the code. (double ugh!)
If the codebase could be cleaned up, the database optimizations would be much easier to do, and we might not even need to switch from MySQL.
Thanks for your mail; I'm a slow worker but I tend to get there in the end. And you are right, switching to Postgres would be a big motivator :)
Jonathan
Jonathan Walther wrote:
On Mon, Nov 11, 2002 at 06:38:41PM -0500, The Cunctator wrote:
I'm always suspicious when someone makes the assertion that Language or Database X is vastly inferior to Language or Database Y, especially when
I'm not saying Postgres would result in dramatic speedups; it's about equal to MySQL in speed. But I believe the slowdown caused by locks WOULD go away. I agree with you that better indexing could speed things
This is just like an edit war, so let's apply the NPOV. That is, let's agree that we are working towards a faster, smoother operation of the system, regardless of which combination of tools eventually achieve that goal. Before we have tried Postgres, let's refrain from claiming that it has "obvious" advantages or drawbacks. Otherwise we risk getting entrenched in prestigious preferences, and few things can be more destructive.
To assess whether an alternative solution really is better or worse, I think we should begin by measuring the current performance, then change, then measure again. Luckily, I've been running some response time measurements continuously since we were running the phase II software. And yes, it has gotten worse in the last 3-4 weeks than before.
Week Percentage of samples when [[Chemistry]] starting took more than 5 seconds to retrieve --------- ---------------------------------------- 13 May 02 9% 20 May 02 8% 27 May 02 8% 3 Jun 02 11% 10 Jun 02 8% 17 Jun 02 n/a 24 Jun 02 n/a 1 Jul 02 n/a 8 Jul 02 13% 15 Jul 02 8% 22 Jul 02 4% <-- move to phase III software, all gets faster 29 Jul 02 5% 5 Aug 02 0% 12 Aug 02 n/a 19 Aug 02 n/a 26 Aug 02 1% 2 Sep 02 n/a 9 Sep 02 n/a 16 Sep 02 n/a 23 Sep 02 n/a <-- n/a means my measurement script was broken 30 Sep 02 1% 7 Oct 02 3% <-- still pretty good 14 Oct 02 6% <-- worse 21 Oct 02 17% <-- bad 28 Oct 02 12% <-- bad 4 Nov 02 8% <-- bad 11 Nov 02 11% (Mon-Wed)
(Yes, in Sweden weeks start on Monday)
Let's get those numbers below 5% again.
My script only tries to retrieve (read) pages. It doesn't try to submit updates. At regular intervals, it accesses a URL for a page, and outputs a log of the date and time, the URL, the HTTP status code, the number of bytes retrieved, and the amount of elapsed time. If you have a better idea for how to extract useful conclusions from such data, please let me know.
On Wed, Nov 13, 2002 at 08:25:22PM +0100, Lars Aronsson wrote:
Jonathan Walther wrote:
On Mon, Nov 11, 2002 at 06:38:41PM -0500, The Cunctator wrote:
I'm always suspicious when someone makes the assertion that Language or Database X is vastly inferior to Language or Database Y, especially when
I'm not saying Postgres would result in dramatic speedups; it's about equal to MySQL in speed. But I believe the slowdown caused by locks WOULD go away. I agree with you that better indexing could speed things
This is just like an edit war, so let's apply the NPOV. That is, let's agree that we are working towards a faster, smoother operation of the system, regardless of which combination of tools eventually achieve that goal. Before we have tried Postgres, let's refrain from claiming that it has "obvious" advantages or drawbacks. Otherwise we risk getting entrenched in prestigious preferences, and few things can be more destructive.
What about trying other backends of MySQL ? It has at least two other than the default one - afair BerkeleyDB and InnoDB.
It shouldn't be much work as it's still MySQL and we could get some data on which of the three is the best for Wikipedia without much work.
wikipedia-l@lists.wikimedia.org