OK, first what you want to do - convert it to
VARBINARY.
Then what you want to do - convert it to VARCHAR utf8 with your
selected collation.
Unfortunately, I'm not well-versed in charsets. Isn't there a danger
in converting to UTF8? We're running MW 1.9.x
Thanks,
-- Jim
On Feb 18, 2008 1:50 PM, Domas Mituzas <midom.lists(a)gmail.com> wrote:
> Jim,
>
> > Haha - yeah, I think we'll have to do that. In our case, when there
> > are duplicates, the one we'll want to keep is the one which is not a
> > redirect, and which has a certain threshold of content (measured in
> > characters most likely).
>
> Makes sense, you may want to keep a log of nuked content (or actually,
> rename them to some random title and log that).
>
> > If we get something that works reliably, there's a good chance I could
> > release it into the wikimedia svn as either an extension or
> > maintenance script. It may have to be an extension to allow a user to
> > interactively select among good candidates.
>
> You are awesome :)
>
> >
> > FYI, currently our page_title field is declared as follows:
> > `page_title` varchar(255) character set latin1 collate latin1_bin
> > NOT NULL default '',
>
OK, first what you want to do - convert it to
VARBINARY.
Then what you want to do - convert it to VARCHAR utf8 with your
selected collation.
>
> Converting to VARBINARY will force MySQL to forget the latin1 crap (in
> this case it is a huge lie, that is PITA eventually)
>
> > Not sure what effect that has on your advice since we're not using utf8 :/
>
> You are using utf8, it is just tagged as latin1, which generally is
> bad idea, but we somehow manage to tolerate that without committing
> seppuku.
>
>
> Domas
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>