Wikitech-l November 2006

wikitech-l@lists.wikimedia.org

87 participants
141 discussions

Re: [Wikitech-l] "Search in Category" function for MediaWiki
by Aerik Sylvan 03 Nov '06

03 Nov '06

Christophe Henner wrote: > > Hi > Still about Category search, I looked for something about it but > didn't found, what about something making possible to have all the > article matching x categorys. > For exemple giving the list of all the articles both in > [[Category:Writer]] and [[Category:Born in London]]. > Have a nice day > -- > schiste That's the much discussed and desired "Category Intersections" and is a tricker problem (at scale) than I thought. I've been testing some ideas on and off, but have been slowed down due to having a hard time clearing the query cache on the server I'm using (I've tried "FLUSH QUERY CACHE" and "RESET QUERY CACHE" but they don't seem to actually do it - all you MySQL gurus out there, what am I missing?). I've got two ideas I want to test: 1) use the existing table and the query I've previously suggested, but constructed it smarter by considering the number of pages in each category - in other words, purposefully narrow down he result set as early as possible (like look at "People born in 1912" and then see how many of those are "Living People" instead of the other way around). 2) Try building a table with a fulltext index using a record for each page, and a column for the categories, delimited by spaces (use underscores for spaces in a category name). This may be a bit hackish, but I'm thinking this will get MySQL to do the tricky part of building the index on categories (each being a word in that column) for me. The MySQL people must've made the fulltext index code as efficient as possible, so it will be interesting to see how it performs. I know full text indexing is not acceptable for whole Wikipedia articles, but if we're only considering categories, we're talking about a lot less text. I've been wondering if maybe this is how Flickr handles tags - whatever they're doing, the functionality seems to match what we want to do, and at a large scale, too. If neither of these work, then I think we're off into either Lucene or some other search function with a custom index/data structure. But I have the strong impression that those are pretty inherently not updated in real-time, which is a bummer. Best Regards, Aerik

1 0

slow diffs
by Travis Derouin 03 Nov '06

03 Nov '06

Hi, We've had some complaints off and on about diffs taking a long time. This happens at random times, doesn't happen often and clears itself up after awhile. When this happens, the load on the Apache servers seem low and the database seems ok. We did seem to have this problem if users had the Google toolbar installed a few months ago, but it appears that the problem is back and unrelated. I wonder if the job queue has anything to do with it, sometimes it get's up to over 10,000 on us, but usually is at 0. I haven't been able to check what it's at when users experience this problem. Does anyone have any ideas on how to debug, or what the possible causes might be? We're still running 1.6.7. Thanks, Travis

3 6

AntiSpoof problems
by Luiz Augusto 03 Nov '06

03 Nov '06

Some days ago I tried to create a user account for myself on wikimania2007.wikimedia.org. But I receveid the message ''Login error: The name "555" is not allowed to prevent confusing or spoofed usernames: Does not contain any letters. Please choose another name.''. I'm a active editor since July 2004 and use the User:555 in lots of wikis since 23 May 2005. [[:en:555 (number)#Other fields|Wikipedia have a article devoted to the number 555]]. Is really necessary this in AntiSpoof extension? I don't imagine any vandal naming in numbers-only usernames, but something like ''User:5559854123 is the phone number of the mother of Jimmy'' is possible using numbers '''+''' letters.

4 3

Re: [Wikitech-l] "Search in Category" function for MediaWiki
by Tim Starling 03 Nov '06

03 Nov '06

Virgil Ierubino wrote: > I would really like to see a function like this developed into MediaWiki, > either an an extension or into a new version. Put simply, one should be able > to perform searches, limiting the field of search to articles in a given > category. The reasons why this would be useful are probably obvious. If not > - the simplest one is just that categories split articles into topics - and > this would allow a user to search in a given topic. > > I see two ways of enacting this, and I think both should be looked into. > Firstly, an "advanced search" option, which I think is not as favourable. > Secondly, and this is what I think would really be good, a search box > automatically appears on a category page. > > Alternatively, when one is viewing a category page, the normal search box > (on the left) acquires a check box - "search in this category". > > Additionally, a further checkbox could be interesting: "also search > subcategories". > > I can't imagine this would be too difficult to implement, but would > certainly be very useful and would make more use of the categorisation > system in MediaWiki. Category search could be implemented fairly easily using Lucene fields. But I think the results would be counterintuitive unless subcategories were included by default, and that's rather more difficult to implement. A top-level category in Wikipedia does not contain a collection of articles on that topic, it contains a list of subcategories and a few miscellaneous articles. To implement subcategory search, MW could recursively load parent categories on page save, and put them in a "parent category" lucene field. Circular reference detection and resource limits would have to be in place. -- Tim Starling

2 1

Special page names case-insensitive and localisable
by Tim Starling 02 Nov '06

02 Nov '06

Cross-posted to wikipedia-l and wikitech-l, I suggest you reply to wikitech-l. I've just committed a change to make special page names case-insensitive and localisable. The default name for a special page can be changed, but a redirect from the English name will always be kept. At present, there are no local sets of names committed, although one has been proposed for German. I have created a wiki page for discussion and coordination of this task: http://www.mediawiki.org/wiki/Special_page_names The following is for developers, everyone else can stop reading here. There are a few practice changes associated with this change, for both core and extension developers. Instead of Title::makeTitle( NS_SPECIAL, 'Contributions' ); use SpecialPage::getTitleFor( 'Contributions' ); Instead of Title::makeTitleSafe( NS_SPECIAL, "Blockip/$ip" ); use SpecialPage::getSafeTitleFor( 'Blockip', $ip ); Instead of ($title->getNamespace() == NS_SPECIAL && $title->getDBkey() == 'Userlogout') use $title->isSpecial('Userlogout') Only the last of these three changes is compulsory for extensions; recognition of core special page names must be migrated. Old titles will continue to work, so the first two changes are optional for extensions and can be done at your leisure. All three changes should be considered compulsory for core code. Extension special pages can provide local special page names using the LangugeGetSpecialPageAliases hook. Hard-coded special page names in messages should be changed simultaneously when the special page names themselves are changed. Special page names are language-dependent, not site-dependent, so there should be no need for a {{SPECIALNAME:xxx}} function. -- Tim Starling

8 11

regression testing parser functions?
by Steve Sanbeg 02 Nov '06

02 Nov '06

I've written an extension that includes a parser function, but I can't figure out how to load it into a regression test. It appears that maintenance/parserTests.php doesn't call setFunctionHook anywhere, and I haven't seen any examples of this. Is regression testing for custom parser functions supported? Thanks -Steve

1 0

Trouble downloading image archive on Debian
by Mike O 02 Nov '06

02 Nov '06

Hi, I'm having a problem trying to download the images archive on Debian with the latest updates installed. When I try wget it fails completely, even when I set a timeout value of 900 seconds. I only get about 4K and then the attempted download stops. When I try lftp (pget URL) or curl they both will download between 300MB and 600MB and then the OS freezes with no indication of why in the syslog. Anyone else had this problem or know a way around it? One more question, can the images be obtained on CD? -- _______________________________________________ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze

1 0

Wikimania 2007 Hacking Days call-for-help
by Tian-Jian "Barabbas" Jiang＠Gmail 02 Nov '06

02 Nov '06

Dear Ladies and Gentlemen, Good day. I am Mike Tian-Jian Jiang ( http://www.linkedin.com/in/barabbas ), a team member of wikimania 2007. I'm trying to arrange Hacking Days currently, here's some rough plan that needs your precious suggestions. We would like to invite experts like you to provide speech and to encourage hackers involving MediaWiki/Wikimedia development. Please let me know if you have any advise to these outlines. For academic professionals, we will also hold a Conference for oral and poster paper presentations. Also please forward this mail to whom may be interested in. Thank you very much! Sincerely, Mike Hacking Days Agenda plan: * Wikimania 2006: o http://wikimania2006.wikimedia.org/wiki/Hacking_Days (Schedule MindMap is missing......) o http://wikimania2006.wikimedia.org/wiki/Hacking_Days_Extras * Wikimania 2005 o http://meta.wikimedia.org/wiki/Wikimania_2005_hacking_days or o http://meta.wikimedia.org/wiki/Wikimania_2005:Hacking_Days * MediaWiki API Introduction：as web 2.0 trend... o Query API o http://meta.wikimedia.org/wiki/API o Python Bot Framework * MediaWiki API application contest：a contest likes Google and Yahoo!'s, tries to attract hackers. o Wikipedia Gadget o Wikipedia Yahoo! Widget * MediaWiki enhancement o Wikiwyg + Ingy's AJAX version + Flash version o Collaboration editing/versioning o Site searching (Is there any plan to make Lucene available for other languages besides English?) o Improving Simplified-Traditional Chinese conversion o Audio/Video processing/streaming o Community: from "Talk" page to forum/bbs with more a flexible reputation system. o Spam/Captcha * MediaWiki system administration o Large scale data processing; please refers to + http://radar.oreilly.com/tag/database <http://docs.google.com/%20%20%20%20%20%20%20%20%20%20%20%20%20http://blog.i…> + http://labs.google.com/papers/bigtable.html <http://docs.google.com/%20%20%20%20%20%20%20%20%20%20%20%20%20http://labs.g…> o Load balancing o "The great wall" problem * Wikipedia content applications o (Crosslingual) search: especially for translation/transliteration of named entities from Wikimedia contents. o (Crosslingual) question-answering: CLEF 2006 already has a pilot task WiQA.

4 3

Problem with mwdumper or the last frwiki dump
by Emmanuel Engelhart 01 Nov '06

01 Nov '06

HI I have downloaded the last frwiki dump, in particulary: http://download.wikimedia.org/frwiki/latest/frwiki-latest-pages-meta-histor… I have checked the md5. I have tried to upload with the following command: /home/kelson/tools/jre1.5.0_06/bin/java -server -classpath /home/kelson/tools/mysql-connector-java-5.0.4/mysql-connector-java-5.0.4-bin.jar:/home/kelson/tools/mwdumper.jar org.mediawiki.dumper.Dumper --output=mysql://localhost/frwiki?user=*******\&password=******** --format=sql:1.5 frwiki-latest-pages-meta-history.xml.bz2 >& titi And that's what I have got on stderr : 8 pages (1,26/sec), 1000 revs (157,456/sec) 10 pages (0,712/sec), 2000 revs (142,339/sec) 16 pages (0,712/sec), 3000 revs (133,583/sec) 20 pages (0,631/sec), 4000 revs (126,267/sec) 28 pages (0,759/sec), 5000 revs (135,45/sec) 32 pages (0,744/sec), 6000 revs (139,519/sec) Exception in thread "main" java.io.IOException: java.sql.SQLException: Not a valid escape sequence: {[Mm]sg:/{{/',530,'Orthogaffe','20040603204946',........... Any idea what goes wrong Kelson

3 4

Doing backups and avoiding downtime
by Travis Derouin 01 Nov '06

01 Nov '06

Hi, We've been using mysqldump to do daily full database backups in case our hardware on our DB server fails. This causes some problems because for a short period of 4 minutes or so, the site in inaccessible because mysqldump has the db locked. I'm not too familiar with the maintenance/dumpPages.xml script, but this script doesn't backup the whole db, including user accounts, recent changes, links, etc, does it? And if it does, it probably doesn't avoid the problem of having to lock the DB for a few minutes, right? Is there any reason why Squid is reporting this error to anonymous users for pages that should be cached? Squid does seem to be caching pages properly. If mysqldump is still the answer,(I'm using the --quick option) are there any other ways we can avoid this brief downtime to capture a backup? How does Wikipedia do this? Thanks a lot, Travis

6 11

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l November 2006