I'm about to commit a major change to the 'old' table in HEAD. There should ideally be no problems, but it's quite possible it will end up corrupting your database, deleting articles or their histories, or doing other bad things. There will almost certainly be minor bugs.
If you have anything important in a 1.4 version database, you MUST backup your database before updating or you WILL lose it!
You have been warned.
Kate.
Is there somewhere that design decisions for the new database schema are being discussed? I asked about sub-wikis back in mid-Sept, but none of the responses really match the feature I'm looking for, as far as I can tell. I think it's worth architecting something into the new schema to accommodate this feature, even if wikipedia only uses one main wiki.
Basically, what I'd like is a way to classify pages into the main wiki and sub-wikis, and then be able to limit search results based on sub-wiki, and maybe eventually limit access controls based on sub-wikis (e.g. [[User:Foo]] is a sysop for sub-wiki 'Science', [[User:Bar]] is a sysop for sub-wiki 'History', etc.)
I'm happy to do some of the coding toward this effort, but I'm new enough to this wiki community that I don't know where these kinds of issues are discussed.
thanks! -Nick
Kate Turner wrote:
I'm about to commit a major change to the 'old' table in HEAD. There should ideally be no problems, but it's quite possible it will end up corrupting your database, deleting articles or their histories, or doing other bad things. There will almost certainly be minor bugs.
If you have anything important in a 1.4 version database, you MUST backup your database before updating or you WILL lose it!
You have been warned.
Kate. _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Sun, 26 Sep 2004 12:48:11 -0400, Evan Prodromou evan@wikitravel.org wrote:
On Sun, 2004-09-26 at 10:42, Kate Turner wrote:
I'm about to commit a major change to the 'old' table in HEAD.
What change?
http://mail.wikimedia.org/pipermail/mediawiki-cvs/2004-September/004069.html old_namespace and old_title are no longer in the old table.
Why aren't these things discussed on-list or on meta:?
Is there a problem with the change that needs to be discussed?
~ESP
Kate.
Kate Turner wrote:
Why aren't these things discussed on-list or on meta:?
Is there a problem with the change that needs to be discussed?
I think people are trying to say that you should ask that question before you actually make the change.
Myself, I have no idea what change you are making, but a change to the 'old' table is going to incur downtime on wikis as large as the English-language Wikipedia. Seeing as we were going to redesign the entire database for 1.5, I don't see the point in doing that.
Timwi
On Sep 26, 2004, at 11:13 AM, Erik Moeller wrote:
Kate-
I'm about to commit a major change to the 'old' table in HEAD.
Shouldn't we wait with major schema changes until REL1_4 is branched?
I've backed out the change until we have a better idea of the upgrade cost. Unless it's surprisingly small I'd much prefer to push it back to post-1.4 and get 1.4 wrapped up soon.
IMHO major schema changes justify a major version number change, i.e. 1.4->2.0. Brion?
(shrug)
-- brion vibber (brion @ pobox.com)
On Sun, Sep 26, 2004 at 08:13:00PM +0200, Erik Moeller wrote:
Kate-
I'm about to commit a major change to the 'old' table in HEAD.
Shouldn't we wait with major schema changes until REL1_4 is branched? IMHO major schema changes justify a major version number change, i.e. 1.4->2.0.
We shouldn't wait. Waiting will just increase the costs of changes. The change is needed, as the previous code was very expensive for page moves. They have been used several times to kill the wiki.
Regards,
JeLuF
On Sep 26, 2004, at 1:13 PM, Jens Frank wrote:
On Sun, Sep 26, 2004 at 08:13:00PM +0200, Erik Moeller wrote:
Kate-
I'm about to commit a major change to the 'old' table in HEAD.
Shouldn't we wait with major schema changes until REL1_4 is branched? IMHO major schema changes justify a major version number change, i.e. 1.4->2.0.
We shouldn't wait. Waiting will just increase the costs of changes. The change is needed, as the previous code was very expensive for page moves. They have been used several times to kill the wiki.
When you come up with a better transition, we'll see.
-- brion vibber (brion @ pobox.com)
On Sun, 26 Sep 2004 13:20:16 -0700, Brion Vibber brion@pobox.com wrote:
On Sep 26, 2004, at 1:13 PM, Jens Frank wrote:
We shouldn't wait. Waiting will just increase the costs of changes. The change is needed, as the previous code was very expensive for page moves. They have been used several times to kill the wiki.
When you come up with a better transition, we'll see.
It isn't a case of better transition. If you can find a better transition, excellent. If not, it _still_ has to be done. Not doing it is not an option.
-- brion vibber (brion @ pobox.com)
Kate.
On Sep 26, 2004, at 1:23 PM, Kate Turner wrote:
On Sun, 26 Sep 2004 13:20:16 -0700, Brion Vibber brion@pobox.com wrote:
On Sep 26, 2004, at 1:13 PM, Jens Frank wrote:
We shouldn't wait. Waiting will just increase the costs of changes. The change is needed, as the previous code was very expensive for page moves. They have been used several times to kill the wiki.
When you come up with a better transition, we'll see.
It isn't a case of better transition. If you can find a better transition, excellent. If not, it _still_ has to be done. Not doing it is not an option.
I suppose you mean to say it's not a _good_ option. It's not a good option, but neither is spending much longer in downtime than we need to. We have an obligation to *keep things running smoothly*; this is why we've volunteered and this is why Jimmy & the foundation allow us to play on their expensive equipment which serves an increasingly high profile FOSS & open content project.
What concerns me is that this was checked in with no discussion and with no estimates done of how disruptive the upgrade would be, despite this issue having been discussed before.
A possible alternative which has been brought up before is to avoid changing the structure of the old table, rather pullings its non-textual data out to a separate table and continuing to use the old table (unaltered) as a store for old_text. This avoids copying around the most data and should in theory be faster.
-- brion vibber (brion @ pobox.com)
Brion-
A possible alternative which has been brought up before is to avoid changing the structure of the old table, rather pullings its non-textual data out to a separate table and continuing to use the old table (unaltered) as a store for old_text. This avoids copying around the most data and should in theory be faster.
How about doing a quick hack now instead, such as adding a "sysop_move" flag to cur_restrictions which would then make certain pages only be movable by sysops? This until we do a proper database redesign, where we would then convert the existing DB into the new schema in one well- coordinated step, with associated downtime.
Regards,
Erik
Erik Moeller wrote:
Brion-
A possible alternative which has been brought up before is to avoid changing the structure of the old table, rather pullings its non-textual data out to a separate table and continuing to use the old table (unaltered) as a store for old_text. This avoids copying around the most data and should in theory be faster.
How about doing a quick hack now instead, such as adding a "sysop_move" flag to cur_restrictions which would then make certain pages only be movable by sysops? This until we do a proper database redesign, where we would then convert the existing DB into the new schema in one well- coordinated step, with associated downtime.
Personally, I'd prefer a transition without downtime such as the one I proposed before, but nobody seemed interested enough to discuss it.
On Fri, 01 Oct 2004 14:29:03 +0200, Timwi timwi@gmx.net wrote:
Personally, I'd prefer a transition without downtime such as the one I proposed before, but nobody seemed interested enough to discuss it.
To the contrary, I think this is very interesting, the right way to do things, and worth serious discussion. It sounded as though Ivan might have been interested in a no-downtime solution as well.
On Sun, Sep 26, 2004 at 01:35:25PM -0700, Brion Vibber wrote:
A possible alternative which has been brought up before is to avoid changing the structure of the old table, rather pullings its non-textual data out to a separate table and continuing to use the old table (unaltered) as a store for old_text. This avoids copying around the most data and should in theory be faster.
Quick benchmark on Diderot, using dewiki (one third the size of enwiki):
INSERT INTO new_old (old_id, old_namespace, old_title, old_comment, old_user, old_user_text, old_timestamp, old_minor_edit, old_flags) SELECT old_id, old_namespace, old_title, old_comment, old_user, old_user_text, old_timestamp, old_minor_edit, old_flags FROM old;
Query OK, 2498816 rows affected (24 min 52.30 sec)
Table was created by
create table new_old ( old_id int(8) unsigned not null primary key, old_namespace tinyint(2) unsigned, old_title varchar(255) binary, old_comment tinyblob, old_user int(5) unsigned, old_user_text varchar(255) binary, old_timestamp varchar(14) binary , old_minor_edit tinyint(1), old_flags tinyblob );
Probably the query can be done while the wiki is online and the transition requires only a short downtime to get the latest changes and to migrate to the new software release.
Regards,
JeLuF
PS: Lies, damn lies, and benchmarks. Test setup was not representative. Ariel is performing better than diderot, but has additional workload at the same time.
On Mon, Sep 27, 2004 at 12:24:44AM +0200, Jens Frank wrote:
On Sun, Sep 26, 2004 at 01:35:25PM -0700, Brion Vibber wrote:
A possible alternative which has been brought up before is to avoid changing the structure of the old table, rather pullings its non-textual data out to a separate table and continuing to use the old table (unaltered) as a store for old_text. This avoids copying around the most data and should in theory be faster.
Quick benchmark on Diderot, using dewiki (one third the size of enwiki):
INSERT INTO new_old (old_id, old_namespace, old_title, old_comment, old_user, old_user_text, old_timestamp, old_minor_edit, old_flags) SELECT old_id, old_namespace, old_title, old_comment, old_user, old_user_text, old_timestamp, old_minor_edit, old_flags FROM old;
Query OK, 2498816 rows affected (24 min 52.30 sec)
Jamesday played around with some MySQL settings and improved performance of this query by a factor of 3.
I think this is the way to go: Keep old as it is, add a new table for the revision metadata, ignore metadata in old.
Regards,
JeLuF
Brion Vibber wrote:
A possible alternative which has been brought up before is to avoid changing the structure of the old table, rather pullings its non-textual data out to a separate table and continuing to use the old table (unaltered) as a store for old_text. This avoids copying around the most data and should in theory be faster.
I had volunteered to do this, and have had a nearly-done patch on hand for a while now, but my ability to find time to code disappeared when classes started. Is there still interest in me finishing this?
Ivan
On Sep 26, 2004, at 10:57 PM, Ivan Krstic wrote:
Brion Vibber wrote:
A possible alternative which has been brought up before is to avoid changing the structure of the old table, rather pullings its non-textual data out to a separate table and continuing to use the old table (unaltered) as a store for old_text. This avoids copying around the most data and should in theory be faster.
I had volunteered to do this, and have had a nearly-done patch on hand for a while now, but my ability to find time to code disappeared when classes started. Is there still interest in me finishing this?
Please do.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org