Hello to this list !
I downloaded enwiki,dewiki,frwiki and itwiki dumps and imported them correctly into Mediawiki. Table text is the table that contains the actual wikipedia pages (field old_text) and it was created as MyISAM for adding fulltext search capabilities. dewiki.text contains 1.248.933 rows of > 2,8 GB size.
I issued this command on mysql (v5.0.45) command line: ALTER TABLE dewiki.text ADD FULLTEXT (old_text);
This command is running now for approx. 20 hours (!) not showing up any errors. Apparently MySQL server is up and running. (W2kSP4; MySQL v5.0.45; Intel Pentium M 1,7 GHz, 512 RAM; mysql-nt.exe is using 152.760 kb constantly)
There is only 1 active thread: Command: Query; Time 74059; State: copy to tmp_table; Info: ALTER TABLE dewiki.text ADD FULLTEXT (old_text);
If interested, see MySQL variables set in this recent post to this list: http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/32872...
I know, this is not a Wikitech question in strict sense. Cannot find sufficient answers for optimizing MySQL for large files as Wikipedia's in Mysql forums or elsewhere...
Any suggestions welcome !
Alex Hoelzel, http://www.meshine.info
Alexander Hölzel CEO EUTROPA AG
=========================== EUTROPA Aktiengesellschaft Oelmüllerstrasse 9, D-82166 Gräfelfing, Tel 089 87130900, Fax 089 87130902 ===========================
On 8/7/07, MESHine Team alerts@meshine.info wrote:
I issued this command on mysql (v5.0.45) command line: ALTER TABLE dewiki.text ADD FULLTEXT (old_text);
This command is running now for approx. 20 hours (!) not showing up any errors. Apparently MySQL server is up and running. (W2kSP4; MySQL v5.0.45; Intel Pentium M 1,7 GHz, 512 RAM; mysql-nt.exe is using 152.760 kb constantly)
There is only 1 active thread: Command: Query; Time 74059; State: copy to tmp_table; Info: ALTER TABLE dewiki.text ADD FULLTEXT (old_text);
Sounds about right. I believe it took a few hours on my similarly-sized forum post table a year or so back, but that was on a dual P4 Xeon 2.0 GHz with 4 GB RAM. Make sure there's more than enough room in your temp folder for it to copy the table a few times over, and it should finish eventually.
By comparison, I've heard that Sphinx takes a couple of minutes to index something that size or even larger. I assume Lucene (also being a dedicated search solution) would be similarly fast. There *is* a reason Wikipedia doesn't use MySQL fulltext search, and I think you're seeing that now.
On Tue, 2007-08-07 at 19:34 +0200, MESHine Team wrote:
I issued this command on mysql (v5.0.45) command line: ALTER TABLE dewiki.text ADD FULLTEXT (old_text);
Let me give this a try on my local Wikimedia mirror here... because (as many here already know), I've had some really great success with performance tweaking the mw install and backend SQL database.
I'm currently importing the 81 dumps per-project (486 separate dumps), and working on the performance of automating that. When that's done, I'll try to add a full text index to dewiki and see how long it takes.
On Tue, 2007-08-07 at 19:34 +0200, MESHine Team wrote:
I issued this command on mysql (v5.0.45) command line: ALTER TABLE dewiki.text ADD FULLTEXT (old_text);
I have a question: How are you able to even execute this command without an error, since old_text is of 'mediumblob' type? You can't add a fulltext search on a mediumblob column type.
# time mysql mediawiki -e "ALTER TABLE ep_de_text ADD FULLTEXT(old_text);"
ERROR 1283 (HY000) at line 1: Column 'old_text' cannot be part of FULLTEXT index
Did you alter the table to set that to 'longtext' or something first?
wikitech-l@lists.wikimedia.org