[Wikipedia-l] Language.php as a database

Mon Aug 5 17:40:47 UTC 2002

>> Instead of including a file like [Language.php], you should
>> use some sort of database for the translation process...
>>
>> Suppose you want the user to press a button in a popup
>> window, addressing him in his own language, you would
>> write something like: "alert(x('Press the button'));"
>>
>>T he x-function would first check if the desired language
>> is English and if so, it would just return it's argument.
>> Otherwise it would consult the database and with the
>> help of the language that the user prefers it will
>> either come up with a translation or not and if not
>> will present the original English text. It will then
>> also put an a new entry in the database telling
>> people that this term needs to be translated to this
>> particular language. It could also put the term in
>> a general database with new terms that need to be
>> translated to all languages.

A few problems with the idea: first, there is not a one-to-one
correspondence bewteen English phrases and their translations.
There may be two or three places in the code that use the same
English text, but which would be different in a foreign language.
But that doesn't argue against a database, just that the database
needs to be indexed by the /use case/, not by the phrase itself.

>> Another point that concerns me is the size of the
>> current file. I guess that this file of around
>> 16 Kbyte is probably loaded and interpreted for
>> every page that is loaded from Wikipedia and may
>> perhaps account for how slow Wikipedia is at times?

No, no, it's precompiled and cached.  Converting to a
database for live access would be much slower, requiring
several API calls per page instead of what are now simple
memory lookups.

>>Other advantages of my system are, that the task
>>of the PHP programmers would become much easier,

I'm all for that of course; and maybe a database system could
be setup that generates a LanguageXx.php file as output, and
which could be added to by non-programmers.

>>Perhaps the translation tables could also be
>>used for other purposes or could be imported
>>from or be shared with other applications.

Again, translating isn't that simple.  Just because English
phrase "X" translates to French "Y" in one context doesn't mean
it will translate to the same phrase in another context.  Each
translation really needs a human to judge the result /as used/
in the software.

> That's a little more like what the old Usemod wiki code does
> (though with a separate include file with a big hash array
> rather than a database per se, and a function that checked the
> string against the hash, falling back on the given English string
> if there was no translation); I do find moving the English strings
> to a separate file confusing when working with the code,
> particularly as Lee isn't heavy on the comment usage.

That's exactly what the code is doing now; it looks up the string
in the hash table, and if it doesn't find it it falls back to
English.  Yes, I am a bit stingy in the comments--I'm of the
school that says if your code /needs/ comments it isn't very good
code, but I suppose a note here or there wouldn't hurt.

> We have a module installed (Alternative PHP Cache) that does this,
> or at least we did on the previous server. I _think_ it's still in
> place, but I could be wrong.

Yes, APC is running.  All the scripts are pre-compiled and cached
in shared memory.

> Group consensus; we can beat Lee into submission if need be, but
> I suspect he'd be amenable. I'm not sure if he reads this list
> regularly; try cross-posting to the tech list
> (wikitechl at nupedia.com) to make sure, and/or submit the idea to
> the feature request tracker:

I don't read intwiki regularly.  The thing that convinces me most
is code.  If you want the code to do something a certain way, you
either have to (a) convince me to do the work, or (b) do the work
yourself, and convince me to test and install it.  Guess which one
I prefer...