If you're reading this, I hit "send" by mistake.
On mar, 2002-03-12 at 03:29, Tomasz Wegrzanowski wrote:
Generally it worked, but there were some problems
Conversion script tried to convert .db.orig files:
Now there are articles like "Propozycje Tematow.db.o" with contents "There was an error converting this file : file not found."
Should be harmless...
Article q{"Casablanca"} (WITH quotes, i don't know how to write it down, let's say that q{} are stronger quotes ;)), currently a redirect to q{Casablanca (film)} was converted to q{"Casablanca"}. I wouldn't mind if it was just removed from database.
!! Okay, that's just wrong. Hmm.
Actually, on my machine I don't even see this article converted. I'm not sure what's going on there, I seem to have a lot of articles missing. Possibly some weird PHP problem, as near as I can tell it's simply not reading all the files in every directory.
Some articles are weird, like: Polimery 12c12 < [?rednia masa cz?steczkowa polimer?w]? --- > Masa cz?steczkowa polimer?w
That looks like a diff which was imported as article. I have no idea what could cause it.
The usemod db format doesn't seem to be quite as consistent as the conversion script wants... I'll look it over again in the morning.
Major problem: z-with-dot letter seems to be screwed in links. Links from http://local_copy_of_wikipedia/wiki.phtml?title=Wy%C5%BCsze_uczelnie_w_Polsc... to subpages are broken. They work on Polish UseMod Wikipedia, so it must be problem with conversion script. Maybe there are some mistakes in latin2 -> unt8 table ?
Ah, looks like parts of the links are accidentally converted from latin2 to utf-8 twice. (Thanks for catching that! It doesn't show up on the Esperanto database I usually test with, which is already mostly UTF-8 and doesn't change on a second conversion.) I've moved the conversion to before the link extraction, it seems better now.
-- brion vibber (brion @ pobox.com)