Wrong database, maybe

List overview All Threads
Download

newer

older

Capital Letters in navigation etc.

Breaking of URL with ' and ()

Javier Bezos

9 Mar 2010 9 Mar '10

4:59 p.m.

Hi all,

This is the background. A year ago we hired a person to update and add some improvements we needed, with an estimated time of 8 weeks. After months of delays and a lot of problems, we've fired him. Unfortunately, I'm afraid the database has been damaged or not properly set up.

Now, two questions--

1) In the short term, I'd like to fix an issue with searches. The site is in Spanish but it's unable to find words with diacriticals in the article body (in the title they are found). I've rebuilt the search index, with no luck. Fortunately, I had access to old dumps and I've noticed a difference -- formerly tables were like

CREATE TABLE `wl_archive` ( `ar_namespace` int(11) NOT NULL default '0', `ar_title` varchar(255) character set latin1 collate latin1_bin NOT NULL default '', [...] ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

and now they are like

SET @saved_cs_client = @@character_set_client; SET character_set_client = utf8; CREATE TABLE `wl_archive` ( `ar_namespace` int(11) NOT NULL default '0', `ar_title` varchar(255) character set latin1 collate latin1_bin NOT NULL default '', [...] ) ENGINE=InnoDB DEFAULT CHARSET=utf8; SET character_set_client = @saved_cs_client;

2) In the medium term, and with the background above, which is the best approach to have a "correct" database and system again.

Thanx Javier -------------------------------- www.wikilengua.org www.tex-tipografia.com

Show replies by date

Raymond Wan

9 Mar 9 Mar

5:20 p.m.

New subject: [Mediawiki-l] Wrong database, maybe

Hi Javier,

The short answer is that I can't help you...I can only tell you what I did and maybe someone else will give a better reply so that I can learn something, too.

Javier Bezos wrote:

...

CREATE TABLE `wl_archive` ( `ar_namespace` int(11) NOT NULL default '0', `ar_title` varchar(255) character set latin1 collate latin1_bin NOT NULL default '', [...] ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

and now they are like

SET @saved_cs_client = @@character_set_client; SET character_set_client = utf8; CREATE TABLE `wl_archive` ( `ar_namespace` int(11) NOT NULL default '0', `ar_title` varchar(255) character set latin1 collate latin1_bin NOT NULL default '', [...] ) ENGINE=InnoDB DEFAULT CHARSET=utf8; SET character_set_client = @saved_cs_client;

This is the exact problem that I had. I had a database with a mix of English and Japanese and could see everything fine. I then checked both the database and the system and found out they were both set to "latin1". If so, I don't know why it had worked for so long...

So, what I did was change the database and the MySQL system to utf8. But then the database also has to be converted and this failed miserably for me. I followed these steps:

http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL

which did not work (more specifically, the "Convert dump" step). In the end, I re-typed the Japanese (which wasn't a lot). So maybe you might have better luck or someone else can help you. Or maybe you can make some sense of the above link and succeed where I failed. :-)

Good luck...

Ray

Platonides

10 Mar 10 Mar

9:32 p.m.

New subject: [Mediawiki-l] Wrong database, maybe

El 09/03/10 10:20, Raymond Wan wrote:

...

Javier Bezos wrote:

...
CREATE TABLE `wl_archive` ( `ar_namespace` int(11) NOT NULL default '0', `ar_title` varchar(255) character set latin1 collate latin1_bin NOT NULL default '', [...] ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

and now they are like

SET @saved_cs_client = @@character_set_client; SET character_set_client = utf8; CREATE TABLE `wl_archive` ( `ar_namespace` int(11) NOT NULL default '0', `ar_title` varchar(255) character set latin1 collate latin1_bin NOT NULL default '', [...] ) ENGINE=InnoDB DEFAULT CHARSET=utf8; SET character_set_client = @saved_cs_client;

This is the exact problem that I had. I had a database with a mix of English and Japanese and could see everything fine. I then checked both the database and the system and found out they were both set to "latin1". If so, I don't know why it had worked for so long...

So, what I did was change the database and the MySQL system to utf8. But then the database also has to be converted and this failed miserably for me. I followed these steps:

http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL

which did not work (more specifically, the "Convert dump" step).

That's wrong. Even when the database shows charset=latin1, mediawiki stores utf8 inside. So you shouldn't have tried to convert the content, that would have broken it.

...

In the end, I re-typed the Japanese (which wasn't a lot). So maybe you might have better luck or someone else can help you. Or maybe you can make some sense of the above link and succeed where I failed. :-)

Good luck...

Ray

Javier, you need to what's the data representation and set to one (either latin1 or utf8), I think you had some tables latin1 and others utf8 now. With $wgDBmysql5 set accordingly.

Without knowing the exact way your db is broken, it'd be hard to fix it. What search are you using? The internal one? Maybe you can share a one-revision sql dump?

PS: Update to 1.15.2, you are vulnerable to CSS image inclusion.

Javier Bezos

10:57 p.m.

New subject: [Mediawiki-l] Wrong database, maybe

Thank yoy for your interest and answers.

Platonides:

...

So you shouldn't have tried to convert the content, that would have broken it.

Very likely this is what they did.

...

Without knowing the exact way your db is broken, it'd be hard to fix it. What search are you using? The internal one? Maybe you can share a one-revision sql dump?

And your crystall ball? ;-) Well, as to the problem with the search I've discovered the key for the searchindex table is generated with U8<hex> while the table has u8<hex>. So, if I replace in SeachMySQL.php

$searchon = $this->db->strencode( $searchon );

$searchon = $this->db->strencode( $wgContLang->lc ($searchon) );

words are found. I'm using this as a workaround, but I'm investigating the origin of the problem (any hints?).

...

PS: Update to 1.15.2, you are vulnerable to CSS image inclusion.

Thank you. Please, could you give a pointer explaining it?

Cheers Javier

Chad

11:45 p.m.

New subject: [Mediawiki-l] Wrong database, maybe

On Wed, Mar 10, 2010 at 9:57 AM, Javier Bezos jbezos@fundeu.es wrote:

...

...
PS: Update to 1.15.2, you are vulnerable to CSS image inclusion.

Thank you. Please, could you give a pointer explaining it?

Cheers Javier

MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

http://lists.wikimedia.org/pipermail/mediawiki-announce/2010-March/000088.ht...

Which was also cross-posted to this list and wikitech-l.

-Chad

Javier Bezos

11 Mar 11 Mar

12:46 a.m.

New subject: [Mediawiki-l] Wrong database, maybe

...

http://lists.wikimedia.org/pipermail/mediawiki-announce/2010-March/000088.ht...

Which was also cross-posted to this list and wikitech-l.

Thank you, Chad, but what I actually wanted was an explanation of the bug. If the problem are external images, I presume a link has to be given somehow, but in our wiki only wiki-like links (ie, [http://...]) are allowed, so I was interested in the details. I presume the fix is this one:

http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=63424

Javier

Platonides

5:40 a.m.

New subject: [Mediawiki-l] Wrong database, maybe

Javier Bezos wrote:

...

...
http://lists.wikimedia.org/pipermail/mediawiki-announce/2010-March/000088.ht...

Which was also cross-posted to this list and wikitech-l.

Thank you, Chad, but what I actually wanted was an explanation of the bug. If the problem are external images, I presume a link has to be given somehow, but in our wiki only wiki-like links (ie, [http://...]) are allowed, so I was interested in the details. I presume the fix is this one:

http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=63424

Javier

Yes. The problem is not the external images feature, but that you could embed an image inside CSS with mediawiki not noticing that it was an external link. Per the advisory, all installs since 1.5 are affected.

Javier Bezos

4:14 p.m.

New subject: [Mediawiki-l] Wrong database, maybe

I wrote:

...

$searchon = $this->db->strencode( $searchon ); by $searchon = $this->db->strencode( $wgContLang->lc ($searchon) );

words are found. I'm using this as a workaround, but I'm investigating the origin of the problem (any hints?).

Found. The problem is in the database. For some reason they have decided the table searchindex should have a collation using latin1 :-(. So I repeat my second question:

...

In the medium term, and with the background above, which

is the best approach to have a "correct" database and system again.

Cheers Javier

Dave butlerdi

5:07 p.m.

New subject: [Mediawiki-l] Wrong database, maybe

When we have had problems it was easier to dump database to sql statements, drop database, recreate with media wiki setup with all correct settings and restore the data without drop table instructions.

This worked for us in the past anyway.

On Thu, Mar 11, 2010 at 9:14 AM, Javier Bezos jbezos@fundeu.es wrote:

...

I wrote:

...
$searchon = $this->db->strencode( $searchon ); by $searchon = $this->db->strencode( $wgContLang->lc ($searchon) );

words are found. I'm using this as a workaround, but I'm investigating the origin of the problem (any hints?).

Found. The problem is in the database. For some reason they have decided the table searchindex should have a collation using latin1 :-(. So I repeat my second question:

...

In the medium term, and with the background above, which

is the best approach to have a "correct" database and system again.

Cheers Javier

MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

-- Regards Dave Butler butlerdi-at-gmail-dot-com Also on Skype as butlerdi Get Skype here http://www.skype.com/download.html ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.mimesweeper.com **********************************************************************

Raymond Wan

9:56 a.m.

New subject: [Mediawiki-l] Wrong database, maybe

Hi,

Platonides wrote:

...

...
So, what I did was change the database and the MySQL system to utf8. But then the database also has to be converted and this failed miserably for me. I followed these steps:

http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL

which did not work (more specifically, the "Convert dump" step).

That's wrong. Even when the database shows charset=latin1, mediawiki stores utf8 inside. So you shouldn't have tried to convert the content, that would have broken it.

Oh...I see -- thanks for this!

Then I guess there are two combinations: Mediawiki with latin1 MySQL ; Mediawiki with UTF MySQL. What are the advantages / disadvantages of either choice?

I *guess* that if someone were to login to mysql directly, and did a SELECT, then the UTF would look like gibberish. Likewise when a dump is done of the data. Of course, neither "problem" affects Mediawiki's functionality...

Any other pros/cons?

Thanks!

Ray

Platonides

14 Mar 14 Mar

8:54 a.m.

New subject: [Mediawiki-l] Wrong database, maybe

Raymond Wan wrote:

...

Oh...I see -- thanks for this!

Then I guess there are two combinations: Mediawiki with latin1 MySQL ; Mediawiki with UTF MySQL. What are the advantages / disadvantages of either choice?

I *guess* that if someone were to login to mysql directly, and did a SELECT, then the UTF would look like gibberish. Likewise when a dump is done of the data. Of course, neither "problem" affects Mediawiki's functionality...

Any other pros/cons?

Thanks!

Ray

MediaWiki offers you three character sets for MySQL: * MySQL 4.1/5.0 binary * MySQL 4.1/5.0 UTF-8 * MySQL 4.0 backwards-compatible UTF-8

In the three modes MediaWiki is storing utf-8 characters. It all depends on how MySQL treats them.

In "backwards-compatible UTF-8" mysql thinks it's latin1. The data will "look wrong" and if you don't provide --default-character-set for mysqldump 4.1 and newer, it will corrupt the text (it will "helpfully" transform it to utf-8). This is the only one which works with mysql 4.0, and it supports the full unicode.

UTF-8 uses MySQL support for UTF-8, which currentyl limits you to the Basic Multilingual Plane. The data will "look right". The indexes will be larger.

With binary, it works almost like backwards utf-8, but mysql will treat it as opaque data and won't mess with it. Representation will be messy. You have the full unicode.

Raymond Wan

3:29 p.m.

New subject: [Mediawiki-l] UTF support (was: Wrong database, maybe)

Hi,

Platonides wrote:

...

MediaWiki offers you three character sets for MySQL:

MySQL 4.1/5.0 binary

MySQL 4.1/5.0 UTF-8

MySQL 4.0 backwards-compatible UTF-8

...

Thank you for your explanation! I think I understand the differences between the 3 options. Also, looking at the installation instructions, it seems I was asked that before and probably just clicked "OK". I installed Mediawiki 3-4 years ago and have been upgrading since...so maybe back then I wasn't asked? I honestly don't remember...

Is there a way to find out what it is set at? Since I've forced mysql to use UTF-8, I'm not sure if I've made a mistake...but since it's retrieving Japanese characters fine, it must be ok... I'm just concerned that I've broken something that will show up later on...

Thanks a lot!

Ray

Platonides

10:49 p.m.

New subject: [Mediawiki-l] UTF support

Raymond Wan wrote:

...

Platonides wrote:

...
MediaWiki offers you three character sets for MySQL:

MySQL 4.1/5.0 binary

MySQL 4.1/5.0 UTF-8

MySQL 4.0 backwards-compatible UTF-8

...

Thank you for your explanation! I think I understand the differences between the 3 options. Also, looking at the installation instructions, it seems I was asked that before and probably just clicked "OK". I installed Mediawiki 3-4 years ago and have been upgrading since...so maybe back then I wasn't asked? I honestly don't remember...

Is there a way to find out what it is set at? Since I've forced mysql to use UTF-8, I'm not sure if I've made a mistake...but since it's retrieving Japanese characters fine, it must be ok... I'm just concerned that I've broken something that will show up later on...

Thanks a lot!

Ray

Look at your table definitions. If the CREATE TABLE contains DEFAULT CHARSET=binary or DEFAULT CHARSET=utf8, you are using the first two, and $wgDBmysql5 should be set. Else you are using the third one and $wgDBmysql5 should be false.

The three options are supported, if it works, don't fix it :)

Raymond Wan

15 Mar 15 Mar

2:18 p.m.

New subject: [Mediawiki-l] UTF support

Hi,

Platonides wrote:

...

Look at your table definitions. If the CREATE TABLE contains DEFAULT CHARSET=binary or DEFAULT CHARSET=utf8, you are using the first two, and $wgDBmysql5 should be set. Else you are using the third one and $wgDBmysql5 should be false.

Ah, thanks -- I'll take a look for it -- thanks!

...

The three options are supported, if it works, don't fix it :)

Yes...good point -- I should have asked on this list before attempting to "fix" something that wasn't broken!

Thanks for the advice!

Ray

5401

Age (days ago)

5407

Last active (days ago)

mediawiki-l@lists.wikimedia.org

13 comments

5 participants

tags (0)

participants (5)

Chad
Dave butlerdi
Javier Bezos
Platonides
Raymond Wan