Erik wrote:
Making a site like Wiktionary or textbook.wiki international is a lot of work with the current setup. A copy of the code and a new database has to be created for each language, and the relevant texts have to be adapted.
Brion's idea for a multilanguage Phase IV where all the now separate Wikipedia wikis would be in a single database under a single software installation would help a great deal (skins would take care of localization).
We could extend that idea by having separate databases/software installations only between different Wikimedia projects (which should all have their own domain names anyway). Subprojects/language versions within the same Wikimedia project would all be in the same database; each wiki page would need a language/subproject tag - or table thingy - and those wiki pages would be in their own directories (again the directories would be named after the appropriate language code).
Side note: IMO, "MediaWiki" would be a good name for our officially nameless software. "PediaWiki" never worked for me since the software is used in at least several non-encyclopedia contexts and will likely be used by many more in the next several years.
But all that is less important than optimizing the current code.
It is better to wait a bit, but keep internationalization in mind from the start. The "textbook"-wiki will likely be relocated to wikibooks.org, where we can then use the scheme that Mav proposes.
Hm. The last thing we need is a porn site at wikibooks.org so since everybody is referring to Wikibook as Wikibooks I went ahead and bought wikibooks.org/.com too. And while I was at it I went ahead and purchased WikimediaFoundation.org/.com as well.
All these domain names will be donated to Wikimedia as soon as it is able to accept a legal transfer of ownership. It will be up to the Foundation to decide what to do with all these domain names and whether or not it makes sense to renew them (if so then I'll probably help to finance that too but I'm sure I won't be the only one).
But for now they are safe from cyber squatters for at least the next year (I'm still mad at myself for not purchasing wikimedia.com before a squatter got it).
-- Daniel Mayer (aka mav)
Daniel-
Brion's idea for a multilanguage Phase IV where all the now separate Wikipedia wikis would be in a single database under a single software installation would help a great deal (skins would take care of localization).
Well, having a single codebase (as in, set of PHP files) for all Wikipedias is not that difficult even with the current code. Brion (or was it Lee?) just chose to set it up as completely separate installations. Having a single *database* is much more difficult because all the tables and queries need to be adjusted accordingly. Merging the tables is certainly undesirable as this would substantially slow things down (the OLD table of the English wiki is currently 8801714176 bytes and dog slow). However, the software could be rewritten to be always multilingual in all available translations, and to create the necessary tables on demand.
Side note: IMO, "MediaWiki" would be a good name for our officially nameless software.
Yup. Many potential wiki users have been scared away by the "Wikipedia" name. "Isn't this just for encyclopedias?" is a question I have heard several times. Then people end up with something useless and scary like TWiki. How are we ever supposed to get decent competition like that? :)
Hm. The last thing we need is a porn site at wikibooks.org
You're quite right. The erotic furry Star Trek slasher fan fiction should be at porn.wikibooks.org ;-)
Regards,
Erik
Erik Moeller wrote:
Merging the tables is certainly undesirable as this would substantially slow things down (the OLD table of the English wiki is currently 8801714176 bytes and dog slow).
Are you sure the size makes it slow? The way I understand it, it doesn't; if you split it up into several tables, it'll still have to organise those different tables. Either way, a row look-up will be O(n) without an index and O(log n) with an index.
What makes a database slow is the number of reads and writes it has to perform (mainly the writes because they block the reads).
So, if my perception of this all is correct, the best way to optimise things is to use efficient queries (trying to minimise writes, especially on the most heavily read-from tables) and things like MemCacheD (which heavily reduces the number of reads), and not to split things up into several tables.
Greetings, Timwi
Timwi-
Are you sure the size makes it slow? The way I understand it, it doesn't; if you split it up into several tables, it'll still have to organise those different tables.
It certainly will, but it won't have to perform, e.g., a fulltext search on the entire index, but only on the index for that table. Compare the fulltext search of the English wiki with the one on the German wiki -- that is certainly no O(log n) scalability. That's why we had to disable the English fulltext search. Now, if we merged the two into a single index, all the fulltext searches would be as slow as the English one is.
You're right, of course, that merging the tables would especially highlight unoptimized queries. But frankly, I'm not sure these need to be "highlighted" any more than they are already ;-)
Regards,
Erik
Erik Moeller wrote:
Timwi-
Are you sure the size makes it slow? The way I understand it, it doesn't; if you split it up into several tables, it'll still have to organise those different tables.
It certainly will, but it won't have to perform, e.g., a fulltext search on the entire index, but only on the index for that table.
First of all, I have to admit I was was being short-sighted again by forgetting to consider fulltext searches. My apologies.
However, I'm not convinced that this really matters. It would only have to search on the entire fulltext index if you actually wanted to search the entire table (an inter-language search). If you had an integer column that specifies the language of an article, and an index on that, then searching only the English text would be equal in speed to having a separate table for the English text.
At least this is how I understand it. Of course I may well be wrong. I'll ask a MySQL expert later on, when he's online :-)
Compare the fulltext search of the English wiki with the one on the German wiki -- that is certainly no O(log n) scalability.
No, certainly not. What I said about O(n) and O(log n) applies to selection of a single row from a simple key on a fixed-length column. What I was thinking of was combining the tables to form one whole Wikipedia database so we could have one single Watchlist and one inter-language Recent Changes page. Then generating an English-only Recent Changes page from the combined table should not be slower as it is now.
Timwi
Timwi wrote:
However, I'm not convinced that this really matters. It would only have to search on the entire fulltext index if you actually wanted to search the entire table (an inter-language search). If you had an integer column that specifies the language of an article, and an index on that, then searching only the English text would be equal in speed to having a separate table for the English text.
At least this is how I understand it. Of course I may well be wrong. I'll ask a MySQL expert later on, when he's online :-)
I have asked my friend and he confirmed by suspicion.
If you have several large tables A, B, C, ... etc. with identical columns and types (that could, for example, be the recentchanges tables for the different languages), you can always combine them into a big table with an extra (indexed) integer or other fixed-length column that specifies what original table the row came from (i.e. what language it's in), and it won't be any slower.
Greetings, Timwi
Timwi wrote:
I have asked my friend and he confirmed by suspicion.
If you have several large tables A, B, C, ... etc. with identical columns and types (that could, for example, be the recentchanges tables for the different languages), you can always combine them into a big table with an extra (indexed) integer or other fixed-length column that specifies what original table the row came from (i.e. what language it's in), and it won't be any slower.
This is true in theory, and I believe true in some "big iron" production database systems (Oracle, etc.), but I'm not sure it's true in practice in many of the free DB systems. In particular, big tables and big indexes can mess up caching algorithms. In fact sometimes it seems to even be beneficial to split into entirely separate databases (not just separate tables in the same database). Why this is true I don't know enough about the internals of the various DB systems to say, but splitting seems to have resulted in a large speedup at kuro5hin.org, among other places that use MySQL.
-Mark
--- Timwi timwi@gmx.net wrote:
Timwi wrote:
However, I'm not convinced that this really
matters. It would only have
to search on the entire fulltext index if you
actually wanted to search
the entire table (an inter-language search). If
you had an integer
column that specifies the language of an article,
and an index on that,
then searching only the English text would be
equal in speed to having a
separate table for the English text.
At least this is how I understand it. Of course I
may well be wrong.
I'll ask a MySQL expert later on, when he's online
:-)
I have asked my friend and he confirmed by suspicion.
If you have several large tables A, B, C, ... etc. with identical columns and types (that could, for example, be the recentchanges tables for the different languages), you can always combine them into a big table with an extra (indexed) integer or other fixed-length column that specifies what original table the row came from (i.e. what language it's in), and it won't be any slower.
Greetings, Timwi
Should not this conversation be on wiki tech ?
__________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
Daniel Mayer wrote:
Side note: IMO, "MediaWiki" would be a good name for our officially nameless software. "PediaWiki" never worked for me since the software is used in at least several non-encyclopedia contexts and will likely be used by many more in the next several years.
I still like "Phase IV Wiki" myself. Although I agree that "MediaWiki" is better than "PediaWiki".
Stephen G.
wikipedia-l@lists.wikimedia.org