[Wikipedia-l] Switching everything to UTF-8
Tomasz Wegrzanowski
taw at users.sf.net
Mon Nov 17 23:02:20 UTC 2003
Staying so long with ISO 8859 was a mistake.
So I propose converting all Wikipedias that aren't using UTF-8 yet to UTF-8.
Procedure should be like that:
1. new LanguageXX.php prepared and put under some name
2. make backups
3. create tables curutf8 and oldutf8
4. disable write access
5. convert all data - numeric HTML codes are going to be replaced by UTF-8 characters too.
6. rename tables cur and old to cur88591 and cur88591
7. rename tables curutf8 and oldutf8 to cur and old
8. replace old LanguageXX.php with utf8-enabled version
9. reenable write access
The conversion script should be tested on test.* Wikipedia first.
During step 5 Wikipedia is going to be read only. It may take some time,
especially with English Wikipedia, so it's better to do conversion of each Wikipedia
separately. During steps 6-8 Wikipedia may not work at all, but it's going to
take less than a minute.
Does anybody have any really good reason why shouldn't I proceed ?
These reasons aren't good enough:
* broken URLs - all old URLs are going to work after upgrade
* size increase - size is going to stay about the same
* broken browsers - they should be upgraded, if someone has browser so old
that it doesn't grok UTF-8, it's not going to grok CSS,
PNGs, and other things we're using either.
Unless we want to remove all CSS and PNGs, there's
no point in not using UTF-8.
* ISO 8859-N is good enough - no, it's not. Not if someone wants to write about
people and places from countries where non-8859-1 Latin
characters are used, or about linguistics, or math, etc.
More information about the Wikipedia-l
mailing list