Jonathan Walther wrote:
On Mon, Nov 11, 2002 at 06:38:41PM -0500, The Cunctator wrote:
I'm always suspicious when someone makes the assertion that Language or Database X is vastly inferior to Language or Database Y, especially when
I'm not saying Postgres would result in dramatic speedups; it's about equal to MySQL in speed. But I believe the slowdown caused by locks WOULD go away. I agree with you that better indexing could speed things
This is just like an edit war, so let's apply the NPOV. That is, let's agree that we are working towards a faster, smoother operation of the system, regardless of which combination of tools eventually achieve that goal. Before we have tried Postgres, let's refrain from claiming that it has "obvious" advantages or drawbacks. Otherwise we risk getting entrenched in prestigious preferences, and few things can be more destructive.
To assess whether an alternative solution really is better or worse, I think we should begin by measuring the current performance, then change, then measure again. Luckily, I've been running some response time measurements continuously since we were running the phase II software. And yes, it has gotten worse in the last 3-4 weeks than before.
Week Percentage of samples when [[Chemistry]] starting took more than 5 seconds to retrieve --------- ---------------------------------------- 13 May 02 9% 20 May 02 8% 27 May 02 8% 3 Jun 02 11% 10 Jun 02 8% 17 Jun 02 n/a 24 Jun 02 n/a 1 Jul 02 n/a 8 Jul 02 13% 15 Jul 02 8% 22 Jul 02 4% <-- move to phase III software, all gets faster 29 Jul 02 5% 5 Aug 02 0% 12 Aug 02 n/a 19 Aug 02 n/a 26 Aug 02 1% 2 Sep 02 n/a 9 Sep 02 n/a 16 Sep 02 n/a 23 Sep 02 n/a <-- n/a means my measurement script was broken 30 Sep 02 1% 7 Oct 02 3% <-- still pretty good 14 Oct 02 6% <-- worse 21 Oct 02 17% <-- bad 28 Oct 02 12% <-- bad 4 Nov 02 8% <-- bad 11 Nov 02 11% (Mon-Wed)
(Yes, in Sweden weeks start on Monday)
Let's get those numbers below 5% again.
My script only tries to retrieve (read) pages. It doesn't try to submit updates. At regular intervals, it accesses a URL for a page, and outputs a log of the date and time, the URL, the HTTP status code, the number of bytes retrieved, and the amount of elapsed time. If you have a better idea for how to extract useful conclusions from such data, please let me know.
On Wed, Nov 13, 2002 at 08:25:22PM +0100, Lars Aronsson wrote:
Jonathan Walther wrote:
On Mon, Nov 11, 2002 at 06:38:41PM -0500, The Cunctator wrote:
I'm always suspicious when someone makes the assertion that Language or Database X is vastly inferior to Language or Database Y, especially when
I'm not saying Postgres would result in dramatic speedups; it's about equal to MySQL in speed. But I believe the slowdown caused by locks WOULD go away. I agree with you that better indexing could speed things
This is just like an edit war, so let's apply the NPOV. That is, let's agree that we are working towards a faster, smoother operation of the system, regardless of which combination of tools eventually achieve that goal. Before we have tried Postgres, let's refrain from claiming that it has "obvious" advantages or drawbacks. Otherwise we risk getting entrenched in prestigious preferences, and few things can be more destructive.
What about trying other backends of MySQL ? It has at least two other than the default one - afair BerkeleyDB and InnoDB.
It shouldn't be much work as it's still MySQL and we could get some data on which of the three is the best for Wikipedia without much work.
On Thu, Nov 14, 2002 at 12:19:27AM +0100, Tomasz Wegrzanowski wrote:
achieve that goal. Before we have tried Postgres, let's refrain from claiming that it has "obvious" advantages or drawbacks. Otherwise we risk getting entrenched in prestigious preferences, and few things can be more destructive.
What about trying other backends of MySQL ? It has at least two other than the default one - afair BerkeleyDB and InnoDB.
Experience with InnoDB on Kuro5hin.org shows it to be problematic. And it still doesn't give you subselects. Subselects can mean some big savings in terms of query times.
Jonathan
On Wed, Nov 13, 2002 at 05:46:25PM -0800, Jonathan Walther wrote:
On Thu, Nov 14, 2002 at 12:19:27AM +0100, Tomasz Wegrzanowski wrote:
achieve that goal. Before we have tried Postgres, let's refrain from claiming that it has "obvious" advantages or drawbacks. Otherwise we risk getting entrenched in prestigious preferences, and few things can be more destructive.
What about trying other backends of MySQL ? It has at least two other than the default one - afair BerkeleyDB and InnoDB.
Experience with InnoDB on Kuro5hin.org shows it to be problematic. And it still doesn't give you subselects. Subselects can mean some big savings in terms of query times.
Well, I don't remember Kuro5hin ever having so serious problems as Wikipedia.
I mean - if there are things that we can check out just now, with very little work, then let's do it. In the worst case, if they won't improve anything or even be worse than current MySQL, we will just be where we started with little efford wasted.
wikitech-l@lists.wikimedia.org