Brion schreef:
>The LuceneSearch extension should DIE DIE DIE. Do not update it or do
>any further development on it, please.
>Instead, update the search front-end in MediaWiki with any necessary new
>interfaces to fully support a backend plugin for Lucene.
I agree that Lucene needs to be supported natively in core, but as long as it isn't, what's wrong with keeping people happy with a temporary module?
The Lucene extension itself wasn't changed BTW.
Roan Kattouw (Catrope)
A while passed since I've fixed all image redirect issues I knew. Now I
propose to enable it by default, so make life easier for poor Commons'
admins :).
P.S. Everyone is still welcome to report about all found bugs to
bugzilla (preferably CCing me)
--VasilievVV
I just had the following thought: For a tag intersection system,
* we limit queries to two intersections (show all pages with categories A and B)
* we assume on average 5 categories per page (can someone check that?)
then we have 5*4=20 intersections per page.
Now, for each intersection, we calculate MD5("A|B") to get an integer
hash, and store that in a new table (page_id INTEGER,intersection_hash
INTERGER).
That table would be 4 times as long as the categorylinks table.
* Memory usage: Acceptable (?)
* Update: Fast, on page edit only
* Works for non-existing categories
On a query, we look for the (indexed) hash in a subquery, then check
those against the actual categorylinks.
Looking up an integer in the subquery should be fast enough ;-)
Given the number of categories and INTEGER >4bn, that would make the
hash unique for all combinations of 65K categories (if the hash were
truely randomly distributed, which it isn't), which should mean that
the number of false positives (to be filtered by the main query)
should be rather low.
If that's fast enough, we could even expand to three intersections (A,
B, and C), querying "A|B", "A|C", and "B|C", and let PHP find the ones
common to all three sets.
Summary: Fixing slow MySQL lookup by throwing memory at it...
Feasible? Or pipe dream?
Magnus
I was implementing a configuration option to allow changes to the
maximum length of edit summaries and log reasons, which I've often
felt are inadequate. In doing so, I discovered an issue that needs to
be resolved before I proceed. Basically, when you type stuff in the
form, the input box's maxlength parameter is a maximum number of
*characters*. But when stuff gets validated and put in the database,
it's generally a number of *bytes* that things are truncated to. I've
verified that Firefox will actually permit 200 multibyte characters to
be submitted as an edit summary, when they cannot possibly fit into
the database field. Now, we actually have two different ways of
dealing with this right now:
* Log reasons just ignore the issue and pretend maxlength (which is
255 for them) is in bytes. Chances are good that's because whoever
wrote up that code thought they *were*. If the log reason happens to
end up being more than 255 bytes, as far as I can tell it will get
sent to the database as-is. That part should definitely be fixed:
MySQL in strict mode, and presumably PostgreSQL and Oracle, will
return a fatal error if this occurs. MySQL in non-strict mode will
silently truncate it without regard to character boundaries, which is
only slightly better (and I guess users of non-English Wikipedias have
gotten used to this behavior).
* Edit summaries do a sort of hacky workaround. They specify
maxlength=200, but then truncate the summary (using a nice
Unicode-aware truncation function) to 250 bytes, and then add an
ellipsis on the end if necessary. This probably works great for
Latin-based languages that just have the occasional two-byte character
with diacritic, but isn't much of a win for speakers of Hebrew or
Greek or Chinese. For English speakers, it's just annoying, since it
artificially limits the edit summary length. Using Firebug, I was
able to delete the maxlength parameter and submit a 250-character
summary on the English Wikipedia (a trick I'll remember for the future
if this doesn't get changed soon ;) ).
The clean way to fix this, it seems to me, is as follows:
* For users of the MySQL UTF-8 schema, there should theoretically be
no problem: all the database sizes are in characters already. The
only change needed for these wikis is to up the edit summary length to
the length of the database field and change the existing truncation
logic to work in characters. I don't think we have any clean way to
detect this scenario in the code at present, however, and anyway
they're not important, they aren't Wikipedia. ;)
* For everyone else, the client-side maxlength limit needs to be
changed to a byte count. This should be possible using JavaScript,
and is probably not otherwise possible (although if it were that would
be great). In the event of a client-side overrun (say, because the
user doesn't have JavaScript), the server should truncate the provided
string and return it in an error message for the user to manually
adjust. It should not silently truncate it. In this case as well,
the maxlength for edit summaries should be upped to 255 bytes.
What does everyone else think about this? Returning a truncated
reason in an error message will be a pain to write up, because there
are so many entry points to this logic that will need to handle the
error in their own strange and idiosyncratic ways, but I think it's
the only correct thing to do here.
This may be the dumbest question I've every asked, so go easy with me
please! In SkinTemplate.php we have this:
$sitecss .= '@import "' . self::makeUrl( '-',
"action=raw&gen=css$siteargs$skinquery" ) . '";' . "\n";
I understand that it helps when a useskin parameter is passed; what I don't
understand is, why does it have to return a value when no such a parameter
is passed? It can simply return "nothing", can't it?
Hojjat
Quoting Daniel Friesen(Dantman)
>>Also remember that whether whatever settings for transwiki import you
>>make, someone can always just use Special:Import using an XML file to
>>import.
Upload import is disabled on WMF wikis (except en.wikiquote for unknown
reasons :S )
>>But going along with that @... Yes, that is a nice addition which would
>>make transwiki import a feature we could allow even non-admins to use.
It would be a nice feature, but I think allowing non-admins to use it would
be a Bad Idea as lots of bad content would be imported inappropriately.
Importing over top of a pre-existing page should perhaps return a request
for confirmation like when moving over top of a page that exists.
>>However it would be good to allow a user to
>>select a range of revisions. So if they really need an entire page, they
>>can import the start, continue the import, and continue on. Bot
>>frameworks like Pywikipedia could then easily turn this into a bot
>>import task and slowly import an entire page without causing server load.
I'd rather see the import limit fixed. I forget exactly why it breaks, but I
think it's either anything over a certain number of revisions, or a certain
amount of data doesn't get imported (or, if you're lucky, gets imported
partially). Some marginally better error messages were added for this, but
it should work better.
>>There is an extra feature or two I would go for two.
>>Title renaming primarily. Sometimes a Wiki has a completely different
>>naming structure, and as a result you need to move a page after
>>importing it. However, what if the wiki already has a page with the same
>>name as the article on another wiki, but that article isn't the one that
>>we want to import. So it would be nice to be able to specify a title to
>>import to.
As an admin on a wiki affected in this way, I have to say that that feature
would not be useful. This is what the Transwiki: namespace is for. Import by
default to that namespace, and move to the new location; this leaves a
redirect behind, which is almost always a good thing, since links on other
projects will usually point to the Transwiki: page, and finding all links on
all projects is really not that much fun.
Mike.lifeguard@enwikibooks
btongminh(a)svn.wikimedia.org wrote:
> Revision: 31471
> Author: btongminh
> Date: 2008-03-02 21:45:06 +0000 (Sun, 02 Mar 2008)
>
> Log Message:
> -----------
> Add API module to query Lucene (bug 10908)
Please don't!
The LuceneSearch extension should DIE DIE DIE. Do not update it or do
any further development on it, please.
Instead, update the search front-end in MediaWiki with any necessary new
interfaces to fully support a backend plugin for Lucene.
-- brion vibber (brion @ wikimedia.org)
catrope(a)svn.wikimedia.org wrote:
> * Adding SimpleCaptcha::addCaptchaAPI() method that adds CAPTCHA information to an API result array. Other CAPTCHA implementations should override this method with a function that does the same (did this for FancyCaptcha and MathCaptcha)
[snip]
> + $resultArr['captcha']['type'] = 'simple';
> + $resultArr['captcha']['id'] = $index;
> + $resultArr['captcha']['question'] = $captcha['question'];
[snip]
> + $resultArr['captcha']['type'] = 'image';
> + $resultArr['captcha']['id'] = $index;
> + $resultArr['captcha']['url'] = $title->getLocalUrl( 'wpCaptchaId=' . urlencode( $index ) );
[snip]
> + $resultArr['captcha']['type'] = 'math';
> + $resultArr['captcha']['id'] = $index;
> + $resultArr['captcha']['sum'] = $sum;
Hmmm... How is an API client meant to figure out what to do with this
captcha data, when there seems to be nothing consistent in how it's
presented? How is it meant to display a new type of challenge which
might be added in the future?
If you're going to have an API for this, I think it needs at least some
minimum future-proofing; otherwise all the clients will break when
something is tweaked in the captcha.
-- brion vibber (brion @ wikimedia.org)