Lars Aronsson lars@aronsson.se writes:
It sounds so easy. But would you accept this procedure if it requires that Wikipedia is unavailable or read-only for one hour? for one day? for one week?
It could be done on-the-fly, even if it takes some weeks. "Simply" also start storing the converted articles in a second table (or database system)...
Assuming these sizes would be the same for an XML dump (ROTFL) and that export/import could be done at 1 MB/second (optimistic), this is 3500 seconds or about one hour for the "cur" table and 83,000 seconds or close to 24 hours for the "old" table. And this is for the sizes of February 2005, not for May 2005 or July 2008. You do the math.
Of course, you must use a database system designed for holding XML data ;) If we start using XML properly we can give up on many a lot hacks. We will also save resources because the set of allowed tags is limited and their usage is well-defined :)