Switching everything to UTF-8 - Wikipedia-l

17 Nov 2003


      Staying so long with ISO 8859 was a mistake.
So I propose converting all Wikipedias that aren't using UTF-8 yet to UTF-8.
Procedure should be like that:
1. new LanguageXX.php prepared and put under some name
2. make backups
3. create tables curutf8 and oldutf8
4. disable write access
5. convert all data - numeric HTML codes are going to be replaced by UTF-8 characters too.
6. rename tables cur and old to cur88591 and cur88591
7. rename tables curutf8 and oldutf8 to cur and old
8. replace old LanguageXX.php with utf8-enabled version
9. reenable write access
The conversion script should be tested on test.* Wikipedia first.
During step 5 Wikipedia is going to be read only. It may take some time,
especially with English Wikipedia, so it's better to do conversion of each Wikipedia
separately. During steps 6-8 Wikipedia may not work at all, but it's going to
take less than a minute.
Does anybody have any really good reason why shouldn't I proceed ?
These reasons aren't good enough:
* broken URLs     - all old URLs are going to work after upgrade
* size increase   - size is going to stay about the same
* broken browsers - they should be upgraded, if someone has browser so old
    		that it doesn't grok UTF-8, it's not going to grok CSS,
    		PNGs, and other things we're using either.
    		Unless we want to remove all CSS and PNGs, there's
    		no point in not using UTF-8.
* ISO 8859-N is good enough - no, it's not. Not if someone wants to write about
    		people and places from countries where non-8859-1 Latin
    		characters are used, or about linguistics, or math, etc.