OK, first what
you want to do - convert it to VARBINARY.
Then what you want to do - convert it to VARCHAR utf8 with your
selected collation.
Unfortunately, I'm not well-versed in charsets. Isn't there a danger
in converting to UTF8? We're running MW 1.9.x
Thanks,
-- Jim
On Feb 18, 2008 1:50 PM, Domas Mituzas <midom.lists(a)gmail.com> wrote:
Jim,
Haha - yeah, I think we'll have to do that.
In our case, when there
are duplicates, the one we'll want to keep is the one which is not a
redirect, and which has a certain threshold of content (measured in
characters most likely).
Makes sense, you may want to keep a log of nuked content (or actually,
rename them to some random title and log that).
If we get something that works reliably,
there's a good chance I could
release it into the wikimedia svn as either an extension or
maintenance script. It may have to be an extension to allow a user to
interactively select among good candidates.
You are awesome :)
FYI, currently our page_title field is declared
as follows:
`page_title` varchar(255) character set latin1 collate latin1_bin
NOT NULL default '',
OK, first what you want to do - convert it to VARBINARY.
Then what you want to do - convert it to VARCHAR utf8 with your
selected collation.
Converting to VARBINARY will force MySQL to forget the latin1 crap (in
this case it is a huge lie, that is PITA eventually)
Not sure what effect that has on your advice
since we're not using utf8 :/
You are using utf8, it is just tagged as latin1, which generally is
bad idea, but we somehow manage to tolerate that without committing
seppuku.
Domas
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org