Hello,
I've installed MediaWiki 1.13, all required extensions, MySQL 5.0.77, configured MediaWiki and created a wikidb; after that I imported the three tables 'page', 'revision' and 'text' from a snapshoot of the Spanish Wikipedia: eswiki-20090421-pages-articles.xml.bz2
all details are in the attached installation guide which worked fine also on another laptop running FreeBSD 7.0. (the attached guide names the 1.11 extension, ofc for 1.13 I have installed the 1.13 extensions);
There have been no problems noted in the 'mysqlimport' (btw: run 16 hours on the small EeePC 900). The three tables are there and filled, how can one prove for example with:
$ mysql -A -uroot -pxxxxxxxxx wikidb Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.0.77-log FreeBSD port: mysql-server-5.0.77_1
mysql> select count(*) from text; +----------+ | count(*) | +----------+ | 1114075 | +----------+
but if I launch a browser against the Wiki, for example:
http://localhost/mediawiki/index.php/Roma
I get a mix of information/pages which reads for this example:
Roma Para otros usos de este término, véase Roma (desambiguación). Imagen:BH escudo.jpg Detalle del escudo de la fachada de la antigua fábrica de Eibar. Bicicletas BH, siglas de Beistegui Hermanos, es la denominación histórica y una de las marcas comerciales de la empresa Bicicletas de Álava, S.A., también denominada BH Bikes. La empresa, fabricante de bicicletas, tiene su sede en la localidad vasca de Vitoria, en Álava (España), aunque es originaria de Éibar (Guipúzcoa). Es uno de los más importantes fabricantes de España y tiene representación internacional. ...
What could be the problem causing that? Thx in advance
matthias
El 5/8/09 7:34 PM, Matthias Apitz escribió:
I've installed MediaWiki 1.13, all required extensions, MySQL 5.0.77, configured MediaWiki and created a wikidb; after that I imported the three tables 'page', 'revision' and 'text' from a snapshoot of the Spanish Wikipedia: eswiki-20090421-pages-articles.xml.bz2
[snip]
I get a mix of information/pages which reads for this example:
Roma Para otros usos de este término, véase Roma (desambiguación). Imagen:BH escudo.jpg Detalle del escudo de la fachada de la antigua fábrica de Eibar. Bicicletas BH, siglas de Beistegui Hermanos, es la denominación
[snip]
I believe there was some buggage in that particular dump which Tomasz is looking into; text got out of sync and at least some entries are listing out the wrong text.
Tomasz, is the 20090504 eswiki dump similarly corrupted or was it a one-off problem?
-- brion
El día Friday, May 08, 2009 a las 10:01:03AM -0700, Brion Vibber escribió:
El 5/8/09 7:34 PM, Matthias Apitz escribió:
I've installed MediaWiki 1.13, all required extensions, MySQL 5.0.77, configured MediaWiki and created a wikidb; after that I imported the three tables 'page', 'revision' and 'text' from a snapshoot of the Spanish Wikipedia: eswiki-20090421-pages-articles.xml.bz2
...
[snip] I believe there was some buggage in that particular dump which Tomasz is looking into; text got out of sync and at least some entries are listing out the wrong text.
...
Ok, I've now imported without any problem an older export from eswiki-20080304-pages-articles.xml.bz2.
The pages are fine (at least I've not seen until now any garbage); It's nice to have a full Spanish MediaWiki on my small EeePC 900. The DB needs around 5 GByte.
I've another (maybe stupid) question: What I have todo that I can use fulltext search in the imported tables 'text' and 'page' with the 'Search' button on the left side? I've read the manual and FAQ in http://www.mediawiki.org/wiki/Manual:Contents but the pointer about fulltext search are speaking about MyISAM databases while mine is InnoDB... Do I miss something?
Thx
matthias
On Sun, May 10, 2009 at 9:58 AM, Matthias Apitz guru@unixarea.de wrote:
I've another (maybe stupid) question: What I have todo that I can use fulltext search in the imported tables 'text' and 'page' with the 'Search' button on the left side? I've read the manual and FAQ in http://www.mediawiki.org/wiki/Manual:Contents but the pointer about fulltext search are speaking about MyISAM databases while mine is InnoDB... Do I miss something?
InnoDB doesn't support fulltext search, which is why the searchindex table should use MyISAM (the other tables can use InnoDB). Run the updateSearchIndex.php maintenance script to populate the index; I don't know how long it will take, though.
El día Monday, May 11, 2009 a las 08:26:51AM -0400, Benjamin Lees escribió:
InnoDB doesn't support fulltext search, which is why the searchindex table should use MyISAM (the other tables can use InnoDB). Run the updateSearchIndex.php maintenance script to populate the index; I don't know how long it will take, though.
Thanks for your explanation. Do you have an idea how much diskspace the searchindex will have, compared with the database size or the text size of the imported file 'text.txt'? The actual size of the DB is:
tiny# du -sh /usr/local/db/mysql 4,8G /usr/local/db/mysql
Thx
matthias
On Mon, May 11, 2009 at 9:20 AM, Matthias Apitz guru@unixarea.de wrote:
Thanks for your explanation. Do you have an idea how much diskspace the searchindex will have, compared with the database size or the text size of the imported file 'text.txt'? The actual size of the DB is:
If you got only the latest revision of each article, the search index should be comparable in size to the text table. Otherwise, it should be vastly smaller. For comparison, here are my sizes from http://www.twcenter.net/wiki/, which has homegrown content:
mysql> SELECT TABLE_NAME, DATA_LENGTH, INDEX_LENGTH, DATA_LENGTH+INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='wikidb' ORDER BY DATA_LENGTH+INDEX_LENGTH DESC LIMIT 5; +-------------+-------------+--------------+--------------------------+ | TABLE_NAME | DATA_LENGTH | INDEX_LENGTH | DATA_LENGTH+INDEX_LENGTH | +-------------+-------------+--------------+--------------------------+ | text | 134807552 | 0 | 134807552 | | searchindex | 7662368 | 7068672 | 14731040 | | revision | 4505600 | 6045696 | 10551296 | | logging | 2637824 | 6324224 | 8962048 | | pagelinks | 1327104 | 1523712 | 2850816 | +-------------+-------------+--------------+--------------------------+ 5 rows in set (0.91 sec)
Note how the text table is an order of magnitude larger than searchindex, because only the most recent revision is indexed. For a full eswiki dump I'd assume it would be a much larger difference, because Spanish Wikipedia pages tend to have many more revisions than my wiki.
wikitech-l@lists.wikimedia.org