On Mon, May 11, 2009 at 9:20 AM, Matthias Apitz guru@unixarea.de wrote:
Thanks for your explanation. Do you have an idea how much diskspace the searchindex will have, compared with the database size or the text size of the imported file 'text.txt'? The actual size of the DB is:
If you got only the latest revision of each article, the search index should be comparable in size to the text table. Otherwise, it should be vastly smaller. For comparison, here are my sizes from http://www.twcenter.net/wiki/, which has homegrown content:
mysql> SELECT TABLE_NAME, DATA_LENGTH, INDEX_LENGTH, DATA_LENGTH+INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='wikidb' ORDER BY DATA_LENGTH+INDEX_LENGTH DESC LIMIT 5; +-------------+-------------+--------------+--------------------------+ | TABLE_NAME | DATA_LENGTH | INDEX_LENGTH | DATA_LENGTH+INDEX_LENGTH | +-------------+-------------+--------------+--------------------------+ | text | 134807552 | 0 | 134807552 | | searchindex | 7662368 | 7068672 | 14731040 | | revision | 4505600 | 6045696 | 10551296 | | logging | 2637824 | 6324224 | 8962048 | | pagelinks | 1327104 | 1523712 | 2850816 | +-------------+-------------+--------------+--------------------------+ 5 rows in set (0.91 sec)
Note how the text table is an order of magnitude larger than searchindex, because only the most recent revision is indexed. For a full eswiki dump I'd assume it would be a much larger difference, because Spanish Wikipedia pages tend to have many more revisions than my wiki.