On Mon, May 11, 2009 at 9:20 AM, Matthias Apitz <guru(a)unixarea.de> wrote:
Thanks for your explanation. Do you have an idea how
much diskspace the
searchindex will have, compared with the database size or the text size
of the imported file 'text.txt'? The actual size of the DB is:
If you got only the latest revision of each article, the search index
should be comparable in size to the text table. Otherwise, it should
be vastly smaller. For comparison, here are my sizes from
http://www.twcenter.net/wiki/, which has homegrown content:
mysql> SELECT TABLE_NAME, DATA_LENGTH, INDEX_LENGTH,
DATA_LENGTH+INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES WHERE
TABLE_SCHEMA='wikidb' ORDER BY DATA_LENGTH+INDEX_LENGTH DESC LIMIT 5;
+-------------+-------------+--------------+--------------------------+
| TABLE_NAME | DATA_LENGTH | INDEX_LENGTH | DATA_LENGTH+INDEX_LENGTH |
+-------------+-------------+--------------+--------------------------+
| text | 134807552 | 0 | 134807552 |
| searchindex | 7662368 | 7068672 | 14731040 |
| revision | 4505600 | 6045696 | 10551296 |
| logging | 2637824 | 6324224 | 8962048 |
| pagelinks | 1327104 | 1523712 | 2850816 |
+-------------+-------------+--------------+--------------------------+
5 rows in set (0.91 sec)
Note how the text table is an order of magnitude larger than
searchindex, because only the most recent revision is indexed. For a
full eswiki dump I'd assume it would be a much larger difference,
because Spanish Wikipedia pages tend to have many more revisions than
my wiki.