Christophe Henner wrote:
> Still about Category search, I looked for something about it but
> didn't found, what about something making possible to have all the
> article matching x categorys.
> For exemple giving the list of all the articles both in
> [[Category:Writer]] and [[Category:Born in London]].
> Have a nice day
That's the much discussed and desired "Category Intersections" and is a
tricker problem (at scale) than I thought. I've been testing some ideas on
and off, but have been slowed down due to having a hard time clearing the
query cache on the server I'm using (I've tried "FLUSH QUERY CACHE" and
"RESET QUERY CACHE" but they don't seem to actually do it - all you MySQL
gurus out there, what am I missing?).
I've got two ideas I want to test:
1) use the existing table and the query I've previously suggested, but
constructed it smarter by considering the number of pages in each category -
in other words, purposefully narrow down he result set as early as possible
(like look at "People born in 1912" and then see how many of those are
"Living People" instead of the other way around).
2) Try building a table with a fulltext index using a record for each page,
and a column for the categories, delimited by spaces (use underscores for
spaces in a category name). This may be a bit hackish, but I'm thinking
this will get MySQL to do the tricky part of building the index on
categories (each being a word in that column) for me. The MySQL people
must've made the fulltext index code as efficient as possible, so it will be
interesting to see how it performs. I know full text indexing is not
acceptable for whole Wikipedia articles, but if we're only considering
categories, we're talking about a lot less text. I've been wondering if
maybe this is how Flickr handles tags - whatever they're doing, the
functionality seems to match what we want to do, and at a large scale, too.
If neither of these work, then I think we're off into either Lucene or some
other search function with a custom index/data structure. But I have the
strong impression that those are pretty inherently not updated in real-time,
which is a bummer.
We've had some complaints off and on about diffs taking a long time.
This happens at random times, doesn't happen often and clears itself
up after awhile. When this happens, the load on the Apache servers
seem low and the database seems ok.
We did seem to have this problem if users had the Google toolbar
installed a few months ago, but it appears that the problem is back
I wonder if the job queue has anything to do with it, sometimes it
get's up to over 10,000 on us, but usually is at 0. I haven't been
able to check what it's at when users experience this problem.
Does anyone have any ideas on how to debug, or what the possible
causes might be? We're still running 1.6.7.
Some days ago I tried to create a user account for myself on
wikimania2007.wikimedia.org. But I receveid the message ''Login error: The
name "555" is not allowed to prevent confusing or spoofed usernames: Does
not contain any letters. Please choose another name.''. I'm a active editor
since July 2004 and use the User:555 in lots of wikis since 23 May 2005.
[[:en:555 (number)#Other fields|Wikipedia have a article devoted to the
Is really necessary this in AntiSpoof extension? I don't imagine any vandal
naming in numbers-only usernames, but something like ''User:5559854123 is
the phone number of the mother of Jimmy'' is possible using numbers '''+'''
Virgil Ierubino wrote:
> I would really like to see a function like this developed into MediaWiki,
> either an an extension or into a new version. Put simply, one should be able
> to perform searches, limiting the field of search to articles in a given
> category. The reasons why this would be useful are probably obvious. If not
> - the simplest one is just that categories split articles into topics - and
> this would allow a user to search in a given topic.
> I see two ways of enacting this, and I think both should be looked into.
> Firstly, an "advanced search" option, which I think is not as favourable.
> Secondly, and this is what I think would really be good, a search box
> automatically appears on a category page.
> Alternatively, when one is viewing a category page, the normal search box
> (on the left) acquires a check box - "search in this category".
> Additionally, a further checkbox could be interesting: "also search
> I can't imagine this would be too difficult to implement, but would
> certainly be very useful and would make more use of the categorisation
> system in MediaWiki.
Category search could be implemented fairly easily using Lucene fields. But
I think the results would be counterintuitive unless subcategories were
included by default, and that's rather more difficult to implement. A
top-level category in Wikipedia does not contain a collection of articles on
that topic, it contains a list of subcategories and a few miscellaneous
To implement subcategory search, MW could recursively load parent categories
on page save, and put them in a "parent category" lucene field. Circular
reference detection and resource limits would have to be in place.
-- Tim Starling
Cross-posted to wikipedia-l and wikitech-l, I suggest you reply to wikitech-l.
I've just committed a change to make special page names case-insensitive and
localisable. The default name for a special page can be changed, but a
redirect from the English name will always be kept. At present, there are no
local sets of names committed, although one has been proposed for German. I
have created a wiki page for discussion and coordination of this task:
The following is for developers, everyone else can stop reading here.
There are a few practice changes associated with this change, for both core
and extension developers.
Title::makeTitle( NS_SPECIAL, 'Contributions' );
SpecialPage::getTitleFor( 'Contributions' );
Title::makeTitleSafe( NS_SPECIAL, "Blockip/$ip" );
SpecialPage::getSafeTitleFor( 'Blockip', $ip );
($title->getNamespace() == NS_SPECIAL &&
$title->getDBkey() == 'Userlogout')
Only the last of these three changes is compulsory for extensions;
recognition of core special page names must be migrated. Old titles will
continue to work, so the first two changes are optional for extensions and
can be done at your leisure. All three changes should be considered
compulsory for core code.
Extension special pages can provide local special page names using the
Hard-coded special page names in messages should be changed simultaneously
when the special page names themselves are changed. Special page names are
language-dependent, not site-dependent, so there should be no need for a
-- Tim Starling
I've written an extension that includes a parser function, but I can't
figure out how to load it into a regression test.
It appears that maintenance/parserTests.php doesn't call setFunctionHook
anywhere, and I haven't seen any examples of this.
Is regression testing for custom parser functions supported?
Hi, I'm having a problem trying to download the images archive on Debian with the latest updates installed.
When I try wget it fails completely, even when I set a timeout value of 900 seconds. I only get about 4K and then the attempted download stops.
When I try lftp (pget URL) or curl they both will download between 300MB and 600MB and then the OS freezes with no indication of why in the syslog.
Anyone else had this problem or know a way around it?
One more question, can the images be obtained on CD?
Surf the Web in a faster, safer and easier way:
Download Opera 9 at http://www.opera.com
Powered by Outblaze
I have downloaded the last frwiki dump, in particulary:
I have checked the md5.
I have tried to upload with the following command:
/home/kelson/tools/jre1.5.0_06/bin/java -server -classpath
--format=sql:1.5 frwiki-latest-pages-meta-history.xml.bz2 >& titi
And that's what I have got on stderr :
8 pages (1,26/sec), 1000 revs (157,456/sec)
10 pages (0,712/sec), 2000 revs (142,339/sec)
16 pages (0,712/sec), 3000 revs (133,583/sec)
20 pages (0,631/sec), 4000 revs (126,267/sec)
28 pages (0,759/sec), 5000 revs (135,45/sec)
32 pages (0,744/sec), 6000 revs (139,519/sec)
Exception in thread "main" java.io.IOException: java.sql.SQLException:
Not a valid escape sequence:
Any idea what goes wrong
We've been using mysqldump to do daily full database backups in case
our hardware on our DB server fails. This causes some problems because
for a short period of 4 minutes or so, the site in inaccessible
because mysqldump has the db locked.
I'm not too familiar with the maintenance/dumpPages.xml script, but
this script doesn't backup the whole db, including user accounts,
recent changes, links, etc, does it? And if it does, it probably
doesn't avoid the problem of having to lock the DB for a few minutes,
Is there any reason why Squid is reporting this error to anonymous
users for pages that should be cached? Squid does seem to be caching
If mysqldump is still the answer,(I'm using the --quick option) are
there any other ways we can avoid this brief downtime to capture a
backup? How does Wikipedia do this?
Thanks a lot,