Hi.
I'd like to know whether it is possible to run Wikipedia with mysql 4.1.
Before installing the whole environment with mysql 4.0.20a, I tried to set
it up with mysql 4.1 and I remember having problems installing mediawiki
which did not accept this version of mysql.
However, my research project requires a lot of queries that contain
subqueries, and it is almost impossible to rewrite them all as JOINS (one of
the reason is that I not only use SELECT queries but also UPDATE and
DELETE). Using Mysql 4.1 would help me considerably.
In case, it is possible to use mysql 4.1, is there any way to upgrade the
version of mysql by keeping my current wikipedia database?
Thank you.
Kevin Carillo
_____
From: wikitech-l-bounces(a)wikimedia.org
[mailto:wikitech-l-bounces@wikimedia.org] On Behalf Of
wikitech-l(a)wikimedia.org
Sent: May 10, 2005 4:34 AM
To: wikitech-l(a)wikimedia.org
Subject: Wikitech-l Digest, Vol 22, Issue 26
Importance: Low
Send Wikitech-l mailing list submissions to
wikitech-l(a)wikimedia.org <mailto:>
To subscribe or unsubscribe via the World Wide Web, visit
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
or, via email, send a message with subject or body 'help' to
wikitech-l-request(a)wikimedia.org <mailto:>
You can reach the person managing the list at
wikitech-l-owner(a)wikimedia.org <mailto:>
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wikitech-l digest..."
Today's Topics:
1. Re: Parser (was Re: Longterm hosting strategy) (Tim Starling)
2. Re: Parser (was Re: Longterm hosting strategy)
(Lee Daniel Crocker)
3. Re: Longterm software strategy (Tim Starling)
4. Re: Parser (was Re: Longterm hosting strategy)
(David A. Desrosiers)
5. Re: Parser (was Re: Longterm hosting strategy)
(?var Arnfj?r? Bjarmason)
6. New machines installed, killed in record 9.5 hours (Brion Vibber)
7. Re: Longterm software strategy (Brion Vibber)
8. Link table updates (was Re: Longterm software strategy)
(Tim Starling)
----------------------------------------------------------------------
Message: 1
Date: Tue, 10 May 2005 13:55:48 +1000
From: Tim Starling <t.starling(a)physics.unimelb.edu.au <mailto:> >
Subject: [Wikitech-l] Re: Parser (was Re: Longterm hosting strategy)
To: wikitech-l(a)wikimedia.org <mailto:>
Message-ID: <d5pau9$7q4$1(a)sea.gmane.org <mailto:> >
Content-Type: text/plain; charset=ISO-8859-1
Lee Daniel Crocker wrote:
I agree, I don't think the parser's a big
issue, although it would be
nice for a bit snappier response. In hindsight, storing the wikitext in
a database was a mistake. There's already a wonderful piece of software
highly optimized and scalable for storing randomly accessed variable-
sized chunks of text with lots of tools for backup, replication, and so
on; it's called a file system. Storing the wikitext itself in something
like Reiserfs would probably speed it up, and also speed up access to
the rest of the metadata in the database which would become much
smaller.
That's what ExternalStore is for. Moving the bulk out of the database,
or at least to a different database, is a pressing need. We need to
separate bulk, rarely accessed data from hot data, so that we can save
the highly redundant storage on the DB master for hot data. Domas has
been working on it. We're running out of disk space on Ariel again, and
another compression round is obviously only a stopgap solution.
-- Tim Starling
------------------------------
Message: 2
Date: Mon, 09 May 2005 21:04:55 -0700
From: Lee Daniel Crocker <lee(a)piclab.com <mailto:> >
Subject: Re: [Wikitech-l] Parser (was Re: Longterm hosting strategy)
To: ?var Arnfj?r? Bjarmason <avarab(a)gmail.com <mailto:> >, Wikimedia
developers
<wikitech-l(a)wikimedia.org <mailto:> >
Message-ID: <1115697895.5779.25.camel(a)shuttle.piclab.com <mailto:> >
Content-Type: text/plain; charset=utf-8
On Tue, 2005-05-10 at 03:44 +0000, Cvar ArnfjC6rC0 Bjarmason wrote:
like Reiserfs
would probably speed it up, and also speed up access to
the rest of the metadata in the database which would become much
smaller.
How about something like a version control system, subversion for
example, I don't know how it would do speed wise for something like
this but with that you'd get
Waaaay too slow (have you ever used Subversion?) But it might not be
a bad idea to put a WebDAV/DeltaV front end on whatever we create to
make it possible for third-party tools to access it.
--
Lee Daniel Crocker <lee(a)piclab.com <mailto:> >
<http://creativecommons.org/licenses/publicdomain/>
------------------------------
Message: 3
Date: Tue, 10 May 2005 14:35:03 +1000
From: Tim Starling <t.starling(a)physics.unimelb.edu.au <mailto:> >
Subject: [Wikitech-l] Re: Longterm software strategy
To: wikitech-l(a)wikimedia.org <mailto:>
Message-ID: <d5pd7q$cie$1(a)sea.gmane.org <mailto:> >
Content-Type: text/plain; charset=ISO-8859-1
Lee Daniel Crocker wrote:
Yes! There's only one tricky part for which we
may have to consider
creative implementations: I tried as much as possible to take style
markup (especially skin-specific) out of the rendered wikitext to
allow it to be cached, but there's one case that's still a problem:
red links (i.e., links to non-existent pages). Users shouted at me
that this was a sine-qua-non feature, and so I had to leave it in.
But it makes caching rendered wikitext hard, and slows down rendering.
One alternative is to simply tolerate them being out of date for the
life of the cache. Another is to possibly update the cache in some
cheaper way. Yet another is to optimize the hell out of discovering
the simple existence of a page, so that it's not a bottleneck in
rendering (say, by having a daemon that keeps a one-bit field for
every page using a spell-checker data structure)
We already optimised it, didn't we? In the last public profiling run:
http://meta.wikimedia.org/wiki/Profiling/20050328
....it came in at 2.6% for the non-stub bundled query and 0.4% for the
stub query. I'd hardly call that a bottleneck. Individual link existence
tests came in at 5.9%, mostly due to special pages, but I've largely
fixed that in 1.5 by bundling the existence tests for commonly requested
special pages. It wasn't so long ago it was taking 15% for individual
queries and 15% for LinkCache::preFill():
http://meta.wikimedia.org/wiki/Profiling/Live_aggregate_20040604
....so we've come a long way.
I'm all for your method, and I agree it's not
an urgent need. But I
think we can slip the timeline even more. The existing codebase will
eventually be a liability, but I think we can throw hardware at it for
a year or two. Also, if we go the route of making independent daemons
linked into the existing UI code, we don't have to deploy all at once.
We could, for example, make and deploy the math daemon as a proof-of-
concept, work out bugs with that, then do the others afterward.
We've already got two proof-of-concept daemons: the Chinese word
segmenter and Lucene.
There is a technical problem with Lucene at the moment: it uses file()
to fetch the result over HTTP, but that has an unconfigurable 3 minute
timeout. If the search daemon goes down, we hit apache connection limits
within a minute and the site stops working. We can either patch PHP to
use default_socket_timeout in this case, or switch to another method
like DIY pfsockopen or curl.
Another thing to consider: at least some of the
wikipedia-driven
development will be totally unnecessary for mediawiki as a general-
purpose open source project. We may want to decouple those projects
at some point.
Brion doesn't want to.
-- Tim Starling
------------------------------
Message: 4
Date: Tue, 10 May 2005 00:50:18 -0400 (EDT)
From: "David A. Desrosiers" <desrod(a)gnu-designs.com <mailto:> >
Subject: Re: [Wikitech-l] Parser (was Re: Longterm hosting strategy)
To: Wikimedia developers <wikitech-l(a)wikimedia.org <mailto:> >
Message-ID: <Pine.LNX.4.62.0505100049060.8023(a)angst.gnu-designs.com
<mailto:> >
Content-Type: TEXT/PLAIN; charset=US-ASCII
Waaaay too slow (have you ever used Subversion?) But
it might not
be a bad idea to put a WebDAV/DeltaV front end on whatever we create
to make it possible for third-party tools to access it.
Only if you're using it with the default (and horribly slow)
bdb backend. If you use fsfs, you'll see performance several orders of
magnitude faster (and it also doesn't wedge or break like bdbd does
all the time).
http://svn.collab.net/repos/svn/trunk/notes/fsfs
David A. Desrosiers
desrod(a)gnu-designs.com <mailto:>
http://gnu-designs.com
------------------------------
Message: 5
Date: Tue, 10 May 2005 07:22:10 +0000
From: ?var Arnfj?r? Bjarmason <avarab(a)gmail.com <mailto:> >
Subject: Re: [Wikitech-l] Parser (was Re: Longterm hosting strategy)
To: Lee Daniel Crocker <lee(a)piclab.com <mailto:> >
Cc: Wikimedia developers <wikitech-l(a)wikimedia.org <mailto:> >
Message-ID: <51dd1af805051000224ffb81c7(a)mail.gmail.com <mailto:> >
Content-Type: text/plain; charset=ISO-8859-1
Waaaay too slow (have you ever used Subversion?) But
it might not be
a bad idea to put a WebDAV/DeltaV front end on whatever we create to
make it possible for third-party tools to access it.
I've used it since early betas, however not in a production
enviroment, just my personal repository, so I haven't felt much need
for speed, however as David pointed out you might get more speed out
of fsfs than dbd and I just mentioned svn as an example, there are
more version control systems in the world.
Regardless, using a VCS would bring diff-based storage, and when it's
all said and done making a custom implementation of vcs-like features
might end up in something not much faster or even slower than a "real"
version control system.
------------------------------
Message: 6
Date: Tue, 10 May 2005 00:54:47 -0700
From: Brion Vibber <brion(a)pobox.com <mailto:> >
Subject: [Wikitech-l] New machines installed, killed in record 9.5
hours
To: Wikimedia developers <wikitech-l(a)wikimedia.org <mailto:> >
Message-ID: <428068C7.30203(a)pobox.com <mailto:> >
Content-Type: text/plain; charset="iso-8859-1"
All the new boxen (srv11-srv30) died mysteriously while Domas and I were
trying to restart Apache after installing PHP's CURL library extension
so a proper timeout could be used on the Lucene search.
By dead, I mean "Destination Host Unreachable". They're off the network,
kaput. That _shouldn't_ happen. :) All the other machines seem just
fine; only the spanking new ones exploded, and the reason for it is not
too clear. (Freak library incompatibility -> killing machines? That
_shouldn't_ happen.)
It may be necessary for somebody to flip the switches and reboot.
-- brion vibber (brion @ <mailto:>
pobox.com)