Hello,
I have implemented Lucene search in my MW 1.9.1 installation, and
everything is working fine. However, I want to have title
suggestions, and as it appears it exists and was enabled by default,
but when I try to enable it (after disabling it) I get the error:
<br /><b>Fatal error</b>: Call to undefined method
LuceneSearch::doTitlePrefixSearch() in <b>{MY LOCAL
FOLDER}/w/extensions/lucene/LuceneSearch body.php</b> on line
<b>299</b><br />
in a <a> tag right beneath my search text box.
I do have sleepycat installed.
I am wondering whether this has been implemented yet, or whether it is
on somebody's TODO list.
Thanks,
Kasimir
--
Kasimir Gabert
The Monobook skin hardcodes first letter lower case for certain UI-messages
(CSS: .portlet h5, .portlet h6). This is a bad idea considering
languages that use upper case a lot in order to differentiate different words
and meanings. German is probably the most common example for such a language.
As long as you use a monolingual (German) wiki a hackish solution works; see
MediaWiki:Monobook.css/de in your wiki of choice for the CSS code which works
on wikis with German as default language only.
As CSS files can't be (and shouldn't need to be) localisable using the sub
page style the solution doesn't work in mutilingual wikis as you need to
overwrite MediaWiki:Monobook.css with said code and now people of all other
languages complain that their interface strings are upper case. This is due
to the fact that every interface string affected by the ".portlet
h5, .portlet h6"-rule is written in any language with upper case in the UI
message files.
So the solution is:
Drop the forced lower case in Monobook and if you don't like your UI strings
starting with upper case in your language of choice make these strings lower
case right from the start in vanilla MediaWiki.
It is bad in any skin to rely on global lower case/upper case behaviour as
this is not language independent. And of course Monobook is the default for
the large majority. So it is a bit strange forcing the majority of users into
non-standard language display because of that.
So if someone thinks that some (older) non-default skins shall use in any case
upper case, please insert in these skins a force-upper-case CSS definition.
At least it would hurt way less people than the current scheme.
Arnomane
P.S.: See also this post in the initial thread covering this topic:
http://lists.wikimedia.org/pipermail/wikitech-l/2007-January/029275.html
Hi list,
We all know that multilingual program interfaces and mutilingual document
presentation is a very difficult task, so please take the following points as
constructive critics and not as a rant against your work (and yes it is
worth reading this veeery long text ;-).
Current state:
So far Mediawiki has achieved quite a lot in order to support monolingual
wikis in any native language and any character set. As long as your wiki just
settles around content in one single language and as long as all users just
use this language as interface language Mediawiki does a great job. However
when a wiki contains content in many languages and if users are using many
different languages for their interface settings (like Wikimedia Commons,
Meta Wiki and some third party wikis, including one created by myself) things
get very problematic.
Missing tests:
In the past MediaWiki was changed several times without checking back wether
the change affects language presentation in multilingual wikis:
* One sudden change for example was that "MediaWiki:Monobook.js" stopped from
being localisable via MediaWiki:Monobook.js/$ISO-Language-code. This old
behaviour was used in every wiki for localising the tooltips. Additionally in
Meta and Wikimedia Commons it was used for localisation of some integral
global javascript routines. After noticing that at least Wikimedia Commons
people did fix that themselves inside the wiki (see below "Fix for i18n
localization not loading." at
http://commons.wikimedia.org/wiki/MediaWiki:Monobook.js). However in the end
the tooltip issue was solved using localisable MediaWiki:Tooltip-$name pages
and MediaWiki:Monobook.js is declared deprecated instead of
MediaWiki:Common.js (which is a good thing). So this issue is finally past.
* A recent change was a MediaWiki namespace cleanup after running
maintenace/update.php. Every message previously automatically copied into the
MediaWiki namespace by the same script was removed. This is in principle a
good thing as this allows for some other fixes (for example in mutilingual
wikis) and reduces problems with hidden message strings. However one
important thing was forgotten. The interface points to several internal pages
(for example the pages linked by MediaWiki:Sidebar). As default these
interface links are not localisable. You have to explecitly whiltelist link
target i18n for every affected page in LocalSettings.php
using "$wgForceUIMsgAsContentMsg = array();" (even this switch alone is a
maintenance problem for multilingual wikis achieving true multilinguality).
The old maintenance/update.php now did a good thing: For every whitelisted
link target it did copy the default language link target into the language
sub pages in every language if there wasn't one created by hand before (for
example the content of MediaWiki:Mainpage was copied into
MediaWiki:Mainpage/de if it was empty). So if there was the link target
existing in the default language it was not possible to link people in other
languages into non-existant "localised" pages after running
maintenance/update.php. However now this workaroud solution doesn't work
anylonger and now people get linked into the nowhere. How to solve that (I
don't want the old copy behaviour back as this is a hakish solution)? Change
at least for link target strings (all that now need whitelisting) the
interface string resolution order from Mediawiki:$Message/$language-code ->
UI-String-File-$Message to Mediawiki:$Message/$language-code ->
Mediawiki:$Message (default language). That way people don't get linked into
nowhere on whiletlisted link targets - and of course you also could entirely
remove the need for $wgForceUIMsgAsContentMsg switch in Mediawiki (and you
would get rid of one Wikimedia server maintence issue that fills your
Bugzilla).
So in future please consider as well implications of message handling changes
towards multilingual wikis. At least two important wikis of Wikimedia are
multilingual and deserve some thoughts.
Skin design error:
The Monobook skin hardcodes first letter lower case for certain UI-messages
(CSS: .portlet h5, .portlet h6). This is a totally bad idea considering
languages that use upper case a lot in order to differentiate different words
and meanings. German is probably the most common example for such a language.
As long as you use a monolingual (German) wiki a hackish solution works; see
MediaWiki:Monobook.css/de in your wiki of choice for the CSS code. As CSS
files can't be (and shouldn't need to be) localisable using the sub page
style the solution doesn't work in mutilingual wikis as now people of all
other languages complain that their interface strings are upper case, since
it seems to be "good style" starting every interface string affected by
the ".portlet h5, .portlet h6"-rule with upper case and later forcing it
again to lower case. So the solution is: Drop the forced lower case in
Monobook and if you don't like your UI strings starting with upper case in
your language of choice make these strings lower case right from the start in
vanilla MediaWiki. There is no other good solution.
Interface string problems:
* Above interface strings were already covered a bit. There is a second
problem with link targets pointing into nowhere. Vanilla Mediawiki strings
contain hardcoded localised link targets. For example have a look at
MediaWiki:Blockedtext (if you want to get all (?) affected messages grep for
{{ns:project}}). Well again in monolingual wikis no problem. These linked
pages are supposed to exist but now consider a multilingual wiki... People
using $non-default-language get pointed into nowhere and as we have quite a
lot languages supported in Mediawiki this means a lot of pages you'd need to
create in previous or if you don't wat that touching a huge number of
interface strings. So in vanilla Mediawiki please do not hardcode any wiki
page in message strings. Well how to localise embedded link targets in the
future? Use a mechanism that points to link target defining mediawiki pages
(like the mediawiki namespace pages defined by MediaWiki:Sidebar). A possible
syntax could be done with "{{subst:Mediawiki:PAGENAME}}" templates (however
this syntax has some problems as this would need to expand to a given sub
page and would need to check if that sub page exists and fall back to default
if not). Another possible syntax could be the use of named variables such as
$MediaWiki:PAGENAME in interface strings.
* Furthermore there is a big inherent communication problem with interface
related changes in mutilingual wikis. If you changed an let us say legal
message string in the default language people using another language won't
notice it. Currently you'd need to overwrite every language-code sub page by
hand in order to make people aware of the change (more than 80 edits needed
for just one single message in order to cover every supported language). This
specific problem has been covered as well by
http://bugzilla.wikimedia.org/show_bug.cgi?id=8188 and is a severe
showstopper to the success of projects like Wikimedia Commons. Several
solution ideas and solution-side-effect problems have been discussed in this
bug (my proposed solution with the changed message string resolution order
would work now much better, as default messages are now deleted in mediawiki
namespace).
Templates/embedded text parts:
* Templates need to be called with their exact name. So if you are using
common templates in multilingual wikis you either have them only in the wiki
default language or you create a template that contains the text let us say
repeated 40 times in every language (you now know why multilingual wikis look
ugly). Template-i18n doesn't work like i18n of message strings with language
code sub pages as teplates can contan variables and people would localise the
variable names in the translated templates as well and you'd get a hell of
inconsitencies. Currently peple are using a very very much fragile Javascript
hack to "hide" not wanted text parts. A better and very good working (TM)
solution is an extension using an xml-like-tag called <Multilang> which
embedds localised strings in the same template/page. See
http://www.mediawiki.org/wiki/Extension:Multilang for code details and
http://bugzilla.wikimedia.org/show_bug.cgi?id=8287 for the related bug entry
in bugzilla. This solution does reduce ugliness of multilingual wikis a lot
and will increase a lot percieved true multilinguality.
Default language:
* Currently anon people can only use the wiki default language. For
multilingual wikis this is a great shortcmmin for the percieved
multilinguality as people stil say "I don't like it is is english by default"
even if everything is existing in their own language as well. This leads to
very ugly "?uselang=" URL hacks that lead to interface flickering (and
render serverside caching useless anyways), see for example
http://commons.wikimedia.org/wiki/Accueil (the "Interface en français" link).
And the language switches back after you clicked at a link. There exists
since some time a patch that would give anon users their language of choice:
http://bugzilla.wikimedia.org/show_bug.cgi?id=3665 (note that caching gets
currently gets affected more and more by these nasty "uselang" tricks; though
I agree with the comments in the bug that a drop down selection would be
better than reyling on browser language, although it would require a cookie
too).
Summary:
All these problems are currently big problems for multilingual wikis but can
be solved in Mediawiki without need for some revolutionary magic code mix-up,
just step by step with fairly small code changes. Fixig these issues would be
a great leap forward for multilingual wikis.
You'd help some struggeling Wikimedia wikis a lot.
Cheers,
Arnomane
Thanks, Eike. I brought your reply back to the list so it can benefit other people. I hope you don't mind.
And you are right - I am not Google, unfortunately.
Tony
-----Original Message-----
From: Eike Frost [mailto:ei@kefro.st]
Sent: Tuesday, January 30, 2007 2:28 PM
To: Webmaster
Subject: Re: [Wikitech-l] FW: Our IP was blocked by mistake
Hi,
note that I don't speak for Wikimedia.
On Tuesday 30 January 2007 18:41, Webmaster wrote:
> Just something that came to my mind...
> Google caches the wikipedia pages just like we were doing. Are you
> blocking Google as well?
You are not Google. You are not a search engine. Even if you were, Google takes great care not to overwhelm the servers it is crawling, and Google does not "frame" Wikipedia in ads when caching it. You're going to be hard-pressed to find ads on
http://google.com/search?q=cache:-L-LvI3XPXcJ:en.wikipedia.org/+wikipedia&h…
for instance. So no, Google is not doing what you are doing.
Wikipedia's license allows you to use the content in that manner. This does not mean it also allows you to overwhelm and/or abuse the servers, especially since you are not the ones paying for them. The downloadable wikimedia database snapshots have been there for years for precisely the purpose you are using Wikipedia's content for.
So please use that instead of trying to argue that you should be allowed to crawl all of Wikipedia with an abusive ASP script where the benefit to Wikipedia is pretty much exactly nil (And I categorize the script as abusive purely by what has been said on the mailinglist). You're not paying for the content and are making plenty of money plastering ads all over it, the least you can do is do research on how to do that without harming Wikipedia's infrastructure and using the resources made available for precisely those purposes. If your website ever becomes as important as Google's then maybe you can ask Wikimedia to generate special files especially for you (like they do for Yahoo) of pay them to do so.
My $0.02, anyway. And the dumps are really nice to work with, too. I've used them in the past :-)
--Eike
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.10alpha (r19687).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
18 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter)
* URL-encoding in URL functions (multiple parameters)
* TODO: Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html)
* TODO: Link containing double-single-quotes '' (bug 4598)
* TODO: message transform: <noinclude> in transcluded template (bug 4926)
* TODO: message transform: <onlyinclude> in transcluded template (bug 4926)
* BUG 1887, part 2: A <math> with a thumbnail- math enabled
* TODO: HTML bullet list, unclosed tags (bug 5497)
* TODO: HTML ordered list, unclosed tags (bug 5497)
* TODO: HTML nested bullet list, open tags (bug 5497)
* TODO: HTML nested ordered list, open tags (bug 5497)
* TODO: Inline HTML vs wiki block nesting
* TODO: Mixing markup for italics and bold
* TODO: 5 quotes, code coverage +1 line
* TODO: dt/dd/dl test
* TODO: Images with the "|" character in the comment
* TODO: Parents of subpages, two levels up, without trailing slash or name.
* TODO: Parents of subpages, two levels up, with lots of extra trailing slashes.
Passed 489 of 507 tests (96.45%)... 18 tests failed!
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Theoretically speaking, these two classes have very distinct roles.
MessageCache is a cache of messages, where userland code should run off
to get messages from. Language, while containing messages, defers this
responsibility to MessageCache, and instead lets users call
language-specific behavior like formatting dates. So, in theory, while
MessageCache is dependent on Language, Language should not be dependent
on MessageCache.
In reality, however, this is not the case. Certain functions in the
Language objects will occasionally need to call getMessageFromDB (which
calls the wfMsg* functions which call the MessageCache) to grab a
possibly customized message. The whole thing is terribly convoluted and
I'm not sure I understand.
I'm wondering, however, how this would be restructured if you had to
chance to refactor this triumvirate of files without any regard to
backwards-compatibility. Some questions:
* A global function is currently being used to do parameter
substitution: if you were to stuff this in a class, which class would go
into?
* The message cache is currently used by extension authors to add their
own customizable messages to the mix. Is adding the messages straight to
the cache the right thing to do, even if it is labeled
mExtensionMessages? Would the cache be responsible for message
retrieval? Should the class be renamed for sake of truthfulness?
* The cache is language aware in that it can accept messages for
specific languages but then figure out which one to use once based on
$wgContLang or $wgLang (whichever is currently taking precedence).
It appears that MessageCache is not so much a cache but a loose
confederation of tools for *getting* messages, performing some magic
along with the global functions to figure out where to look, its own
cache being only one of many places to look. Which is why I'm having
trouble figuring this stuff out.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFvmmAqTO+fYacSNoRAjS8AJ9urmXuWJT4qX7WwSM+jJ48KlSzRQCfS5W+
ALxG/UmkbyrMTAMv5ClJAQc=
=/qL7
-----END PGP SIGNATURE-----