Continuing to dig into this...
Below are page table descriptions from mysql:
-----PAGE TABLE STATUS BEFORE UPGRADE-----
mysql> SHOW TABLE STATUS WHERE `NAME` LIKE '%page%';
+-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows |
Avg_row_length | Data_length | Max_data_length | Index_length | Data_free |
Auto_increment | Create_time | Update_time | Check_time |
Collation | Checksum | Create_options | Comment |
+-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| page | InnoDB | 10 | Compact | 12461 |
127 | 1589248 | 0 | 2162688 | 11534336 |
14232 | 2015-11-01 10:40:06 | NULL | NULL | utf8_general_ci
| NULL | | |
| page_props | InnoDB | 10 | Compact | 0 |
0 | 16384 | 0 | 0 | 11534336 |
NULL | 2015-11-01 10:40:07 | NULL | NULL | utf8_general_ci
| NULL | | |
| page_restrictions | InnoDB | 10 | Compact | 2 |
8192 | 16384 | 0 | 65536 | 11534336
| 3 | 2015-11-01 10:40:07 | NULL | NULL |
utf8_general_ci | NULL | | |
| pagelinks | InnoDB | 10 | Compact | 28377 |
56 | 1589248 | 0 | 1589248 | 11534336 |
NULL | 2015-11-01 10:40:07 | NULL | NULL | utf8_general_ci
| NULL | | |
+-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
mysql> SHOW COLUMNS FROM page;
+-------------------+---------------------+------+-----+----------------+----------------+
| Field | Type | Null | Key | Default |
Extra |
+-------------------+---------------------+------+-----+----------------+----------------+
| page_id | int(10) unsigned | NO | PRI | NULL |
auto_increment |
| page_namespace | int(11) | NO | MUL | NULL
| |
| page_title | varchar(255) | NO | | NULL
| |
| page_restrictions | tinyblob | NO | | NULL
| |
| page_counter | bigint(20) unsigned | NO | | 0
| |
| page_is_redirect | tinyint(3) unsigned | NO | | 0
| |
| page_is_new | tinyint(3) unsigned | NO | | 0
| |
| page_random | double unsigned | NO | MUL | NULL
| |
| page_touched | binary(14) | NO | |
| |
| page_latest | int(10) unsigned | NO | | NULL
| |
| page_len | int(10) unsigned | NO | MUL | NULL
| |
+-------------------+---------------------+------+-----+----------------+----------------+
mysql> select page_id,page_title from page where page_title like 'Adhik%';
+---------+----------------------------+
| page_id | page_title |
+---------+----------------------------+
| 11974 | Adhikamasa |
| 11975 | Adhikamasa,_adhimasa |
| 7128 | AdhikamÄ sa |
| 6658 | AdhikamÄ sa,_adhimÄ sa |
| 11977 | Adhikara |
| 11978 | Adhikara_Nandi |
| 11976 | Adhikarana |
| 6660 | Adhikaraṇa |
| 6659 | AdhikÄ ra |
| 6661 | AdhikÄ ra_Nandi |
| 7126 | AdhikamÄ sa,_adhimÄ sa |
| 7117 | Adhikaraṇa |
| 7125 | AdhikÄ ra |
| 7118 | AdhikÄ ra_Nandi |
+---------+----------------------------+
-----PAGE TABLE STATUS AFTER UPGRADE-----
+-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows |
Avg_row_length | Data_length | Max_data_length | Index_length | Data_free |
Auto_increment | Create_time | Update_time | Check_time |
Collation | Checksum | Create_options | Comment |
+-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| page | InnoDB | 10 | Compact | 11883 |
133 | 1589248 | 0 | 2162688 | 111149056 |
14232 | 2015-11-01 10:40:06 | NULL | NULL | utf8_general_ci
| NULL | | |
| page_props | InnoDB | 10 | Compact | 0 |
0 | 16384 | 0 | 0 | 111149056 |
NULL | 2015-11-01 10:40:07 | NULL | NULL | utf8_general_ci
| NULL | | |
| page_restrictions | InnoDB | 10 | Compact | 2 |
8192 | 16384 | 0 | 65536 | 111149056
| 3 | 2015-11-01 10:40:07 | NULL | NULL |
utf8_general_ci | NULL | | |
| pagelinks | InnoDB | 10 | Compact | 28765 |
55 | 1589248 | 0 | 1589248 | 111149056 |
NULL | 2015-11-01 10:40:07 | NULL | NULL | utf8_general_ci
| NULL | | |
+-------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
mysql> select page_id,page_title from page where page_title like 'Adhik%';
+---------+----------------------------+
| page_id | page_title |
+---------+----------------------------+
| 11974 | Adhikamasa |
| 11975 | Adhikamasa,_adhimasa |
| 7128 | AdhikamÄ sa |
| 6658 | AdhikamÄ sa,_adhimÄ sa |
| 11977 | Adhikara |
| 11978 | Adhikara_Nandi |
| 11976 | Adhikarana |
| 6660 | Adhikaraṇa |
| 6659 | AdhikÄ ra |
| 6661 | AdhikÄ ra_Nandi |
| 7126 | AdhikamÄ sa,_adhimÄ sa |
| 7117 | Adhikaraṇa |
| 7125 | AdhikÄ ra |
| 7118 | AdhikÄ ra_Nandi |
+---------+----------------------------+
Not sure if this is correct, but I came across
http://product.hubspot.com/bid/7049/MySQL-and-Unicode-Three-Gotchas
which suggested that the page_title field could be improperly encoded (as
varchar 255 instead of utf8).
I tried to do
ALTER TABLE page MODIFY page_title VARCHAR(255) CHARACTER SET utf8;
But there are way too many conflicts/duplicates in the key page_name which
prevents me from doing it w/o dropping the index. BTW, if I do it, I can't
recreate it with the "UNIQUE" property which it has right now due to the
duplicates.
https://www.mediawiki.org/wiki/Manual:MWDumper
Suggests that the character type should be set to Binary to prevent these
types of issues...but not sure what I need to do to get there from where I
am right now.
Any ideas?
Thanks,
Krishna
--------------------------------------------------------------------------------
Krishna Maheshwari
Hindupedia, the Hindu Encyclopedia (
www.hindupedia.com)
--------------------------------------------------------------------------------