Ashar Voultoiz wrote:
Hello !
Sometime ago there was a lot of discussion about a database shema
change. I am wondering if that's a dead-end horse or if we can
eventually start discussion again and eventually got the
changes in for
version 1.4 ?
The main holdup so far has been the knowledge that we'll need
to convert
the current Wikipedias, so any change needs to be constrained
to reduce
downtime.
The biggest changes that we'd like to do are:
* Put current and old revisions together, to avoid having to
juggle data
between cur and old all the time and to allow a consistent revision
numbering scheme (permalinks for current revision as well as the olds)
Personnaly I dont like the idea to put old and cur together, it will be more difficult to
setup a dump job :)
* Separate page information like the title from the
individual
revision
data, so page renames of longstanding pages don't bog down the server
touching thousands of rows
Dont really needed if 1 row = 1 page (old+cur) ?
* Separate the actual *text* out of the revision
information table so
that contribs, history, etc don't pull so much junk into memory.
Yes, separate it, will improve speed of special pages :)
And possibly:
* Add language-specific title sort key field
Is it possible ?
* Add case-insensitive 'canonical title match'
field
for desambiguation ?
Now as you can imagine changing the cur and old tables
is
about the most
traumatizing thing we could do to the servers. :)
Well, doing it on ariel should be fast, the main pb is space :) We dont have enough
actually to create a big temporary table.
A 'minimal effort' change could be to move all
cur revisions into old,
then rebuild a 'page' table from the cur data and let 'old' be a
revisions table. I believe Jamesday has been experimenting with seeing
how slow/fast this kind of conversion might be...
I think all changes should be tested first on suda, and all queries we use should be
benchmarked.
shaihulud