Wikitech-l January 2004

wikitech-l@lists.wikimedia.org

92 participants
148 discussions

Compressed old revisions -- please test
by Brion Vibber 03 Jan '04

03 Jan '04

I'm checking in some code to deal with compressing data in the old table. The primary motivation here is to decrease the amount of disk and cache space necessary for storing page revision data, without the complications and fragility of differential compression.[1] Compression is done with gzdeflate() / gzinflate(), which requires zlib support compiled into PHP. This is the same compression that would be used in a gzip file, but without the header bytes. Compressed revisions are marked with old_flags="gzip". The old_flags column has existed unused for quite some time, so no schema change is necessary. The compressed data goes back into old_text; I don't think there is a problem with storing binary data in a TEXT field, as supposedly TEXT and BLOB differ only in matching and sorting characteristics. Article::getRevisionText() accepts a row object (as from wfFetchObject) containing both old_text and old_flags fields and returns the text, uncompressed if necessary. This scheme also works in the archive table, maybe... there are probably problems with undeletion that need to be checked. So far there's no on-the-fly compression; a maintenance script compressOld.php is provided to batch-compress old revisions. It can be given an arbitrary starting point old_id, and will go until it gets to the end of the table or you kill it. It should be safe to run in the background while the wiki is live; it makes single-row UPDATEs keyed by old_id. On my 2 GHz Athlon XP this goes at about 10,000 rows per minute otherwise unloaded. I haven't done any comparative testing of load times, but the effect should be dwarfed by parse/render times and will only come up on old and diff views and a few other rare places. I tested with the New Years' dump of the French Wikipedia (about 200k rows in old). Raw dump size: old_table.sql 1,210,368,249 old_compressed.sql 485,536,046 Space saved: ~60% If these ratios hold, I estimate the total savings at about 14 gigabytes, bringing our total db usage to something more like 20 GB. This is a reasonably big improvement for very small changes in code. (Note that the innodb data storage space never shrinks; to reclaim disk space for purposes other than storing the next couple million edits would require dumping everything and reimporting it fresh.) There are a couple of downsides. The SQL dumps get slightly more illegible, and old revisions won't be loadable on a MediaWiki installed with some configurations of PHP (the default configure options don't include zlib). Also, recompressing the resultant dump doesn't do so well: old_table.sql.bz2 199,394,376 old_compressed.sql.bz2 416,208,437 This doubles the size of the raw dumps. Ouch! Well, we should be looking at a more usable dump format anyway. -- brion vibber (brion @ pobox.com) [1] Ultimately we'd probably save a lot of disk space by storing diffs between revisions, but loading an individual revision then requires sifting through multiple revisions from the last checkpoint, and requires extra work to ensure that intermediate revisions are not corrupted, reordered, removed, etc. By compressing each revision separately, we still maintain the integrity of the rest of the history if any one revision is corrupted, if histories are reordered or recombined, if individual revisions are plucked out or blanked for legal reasons, etc.

1 1

help needed for arabic wikipedia
by Elisabeth Bauer 03 Jan '04

03 Jan '04

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, I would like to invite people to join the Arabic wikipedia, but at the moment editing is almost impossible there (at least for me). Try for yourself (for example to create a link) http://ar.wikipedia.org/wiki/How_to_edit_Arabic_pages http://ar.wikipedia.org/wiki/Wikipedia:Sandbox With the help of Ibn Alnatheer I localized some parts of the GUI, but after half an hour work there I need an aspirin... Can something be done to make editing there easier? How does Hebrew wikipedia handle the RTL stuff? If there are unsolved problems, maybe the people from arabeyes.org can help. greetings, elian PS:For recentchanges the order (same as Hebrew): (comment) (talk) user time article hist diff would look much better than the mess now. Could someone please change this? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (Darwin) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQE/9vPRY9sNrbXL4CERAqKTAKCZHbGO6R7iWgy4YDvNYGYqIbvJBgCfZ4mE 3cgcO+Z1yTxnMctRxOXx/kc= =iWNQ -----END PGP SIGNATURE-----

1 0

Web Hosting
by Evan Prodromou 02 Jan '04

02 Jan '04

So, this is kind of a sideways suggestion, but... we just moved Wikitravel to a Web hosting service (xlinternet.com). It's working pretty much great, right out of the box. Considering that MediaWiki runs on some pretty standard software (PHP, mySQL), I wonder if it wouldn't be a good idea to leave most of the yucky sysadmining problems up to folks who make it their business. I'm sure that a project as big as Wikimedia could get some special treatment. I don't know what kind of bandwidth and storage requirements Wikimedia has, but I doubt that they'd be insurmountable with any given Web hosting service. Just a suggestion to consider. ~ESP -- Evan Prodromou <evan(a)wikitravel.org> Wikitravel - http://www.wikitravel.org/ The free, complete, up-to-date and reliable world-wide travel guide

1 0

DNS Voodoo / load balancing- some google results
by Gabriel Wicke 01 Jan '04

01 Jan '04

I did some googling tonight and found some more information about ways to do some magic distributed caching. An example for a proprietary product doing this is Cisco DistributedDirector (19000$). Anybody? Ok, so here is the open source alternative: * Super Sparrow http://www.supersparrow.org/ Open Source, linux, and tested. It's running for example vergenet, you can see it in action at http://www.vergenet.net/vergenet/. In combination with Linux Virtual Server& Heartbeat plus distributed squid 'mirrors' this looks like a nice way for future growth. IMO this is nothing for the immediate future, but good to keep in mind and start playing with. Maybe it would also be possible to ask Horms (Simon Horman, http://www.vergenet.net/~horms/) for advice, he definetely is an expert in this field. Have a nice new year! Gabriel Wicke

1 1

Re: [Wikitech-l] Server update
by Jason Richey 01 Jan '04

01 Jan '04

I think it was my fault that Ursula went down... The time she went down seems to correspond roughly with the time I arrived at the colo. The last time I was in the colo, I hadmy laptop configured to use what is now Ursula's IP address. So, when I plugged my laptopin, it must have freaked Ursula out even after I changed the IP address on the laptop. Sorry for the trouble. Jason Erik Moeller wrote: > Brion- > > Jason's got Ursula back up, and our new machine is also installed. I'm > > copying files over so it can take over pliny's web work and let Ursula > > do just the db. > > Out of curiosity, why did Ursula go down? If the cause is unknown, could > there be an issue with our database that might cause such crashes? > > Regards, > > Erik > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)Wikipedia.org > http://mail.wikipedia.org/mailman/listinfo/wikitech-l -- "Jason C. Richey" <jasonr(a)bomis.com>

1 0

Mailing list archives
by Brion Vibber 31 Dec '03

31 Dec '03

Does anyone know offhand how easy/difficult it would be to import stuff sent through the backup MX into the mailing list archives on the main server? -- brion vibber (brion @ pobox.com)

1 0

Donation History
by Daniel Mayer 31 Dec '03

31 Dec '03

Adam Hunt wrote: http://mail.wikipedia.org/pipermail/wikitech-l/2003-December/007581.html Response at: http://mail.wikipedia.org/pipermail/wikipedia-l/2004-January/013579.html Please respond there. This topic is not appropriate for the tech list. -- Daniel Mayer (aka mav) __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree

1 0

January 1 backup dump online (and old Tomeraider copies)
by Brion Vibber 31 Dec '03

31 Dec '03

Enjoy! http://download.wikimedia.org/ Since Geoffrin is still out of service, Ursula is serving this up from .204. The December update to the Tomeraider archives isn't online yet. I'll see if I kept a local copy; if not I'll either have to get them from Erik again or wait until Geoffrin is back up. Happy new year, eveybody... and let's not forget that Wikipedia turns 3 on January 15! The terrible twos are coming to an end. :) -- brion vibber (brion @ pobox.com)

1 0

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l January 2004