Wikitech-l March 2009

wikitech-l@lists.wikimedia.org

108 participants
93 discussions

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [48735] trunk/phase3/includes/specials/SpecialRecentchanges.php
by Tisza Gergő 30 Mar '09

30 Mar '09

> Revision: 48735 > > Author: midom > > Date: 2009-03-24 10:44:24 +0000 (Tue, 24 Mar 2009) > > Log Message: > > ----------- > > change limit to reflect one in interface. :) > > Modified Paths: > > -------------- > > trunk/phase3/includes/specials/SpecialRecentchanges.php > > Modified: trunk/phase3/includes/specials/SpecialRecentchanges.php > > =================================================================== > > --- trunk/phase3/includes/specials/SpecialRecentchanges.php 2009-03-24 09:59:13 UTC (rev 48734) > > +++ trunk/phase3/includes/specials/SpecialRecentchanges.php 2009-03-24 10:44:24 UTC (rev 48735) > > @@ -55,7 +55,7 @@ > > $this->parseParameters( $parameters, $opts ); > > } > > - $opts->validateIntBounds( 'limit', 0, 5000 ); > > + $opts->validateIntBounds( 'limit', 0, 500 ); > > return $opts; > > } Was this necessary for performance reasons? A lot of people were using >500 recentchanges lists, some wikis even had them as options on the RC interface (see http://hu.wikipedia.org/wiki/MediaWiki:Recentchangestext for example). If it was only changed for aesthetic purposes, please change it back, or make it a site option.

3 2

Bugzilla Weekly Report
by reporter＠isidore.wikimedia.org 30 Mar '09

30 Mar '09

MediaWiki Bugzilla Report for March 23, 2009 - March 30, 2009 Status changes this week Bugs NEW : 166 Bugs ASSIGNED : 5 Bugs REOPENED : 15 Bugs RESOLVED : 98 Total bugs still open: 3483 Resolutions for the week: Bugs marked FIXED : 56 Bugs marked REMIND : 0 Bugs marked INVALID : 11 Bugs marked DUPLICATE : 16 Bugs marked WONTFIX : 6 Bugs marked WORKSFORME : 5 Bugs marked LATER : 4 Bugs marked MOVED : 0 Specific Product/Component Resolutions & User Metrics New Bugs Per Component AbuseFilter 11 CodeReview 7 Uploading 6 General/Unknown 6 Page editing 4 New Bugs Per Product MediaWiki 39 Wikimedia 13 MediaWiki extensions 37 Top 5 Bug Resolvers innocentkiller [AT] gmail.com 18 JSchulz_4587 [AT] msn.com 10 Andrew [AT] epstone.net 9 brion [AT] wikimedia.org 8 roan.kattouw [AT] home.nl 8

1 0

Enwiki dump crawling since 10/15/2008
by Christian Storm 30 Mar '09

30 Mar '09

>> On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote: >> The current enwiki database dump (http://download.wikimedia.org/enwiki/20081008/ >> ) has been crawling along since 10/15/2008. > The current dump system is not sustainable on very large wikis and > is being replaced. You'll hear about it when we have the new one in > place. :) > -- brion Following up on this thread: http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040841.html Brion, Can you offer any general timeline estimates (weeks, months, 1/2 year)? Are there any alternatives to retrieving the article data beyond directly crawling the site? I know this is verboten but we are in dire need of retrieving this data and don't know of any alternatives. The current estimate of end of year is too long for us to wait. Unfortunately, wikipedia is a favored source for students to plagiarize from which makes out of date content a real issue. Is there any way to help this process along? We can donate disk drives, developer time, ...? There is another possibility that we could offer but I would need to talk with someone at the wikimedia foundation offline. Is there anyone I could contact? Thanks for any information and/or direction you can give. Christian

14 24

Possible license issues in MediaWiki source code
by Siebrand Mazeland 29 Mar '09

29 Mar '09

Occasionally I visit Ohloh.net to satisfy my stats addiction. One of the things Ohloh analyses in the source code is license information[1]. On the Ohloh MediaWiki page[2] an analysis summary is displayed. It contains the following warnings (number of files added by me from [1]): # Mozilla Public License 1.0 may conflict with GPL (253 files) # PHP License may conflict with GPL (7 files) # Apache Software License may conflict with GPL (1 file) # Artistic License may conflict with GPL (7 files) # Common Development and Distribution License may conflict with GPL (1 file) # Apache License 2.0 may conflict with GPL (7 files) I am wondering if any of these warnings can really point out a licensing issue. If they do, I think we need to persue this, and get it sorted out. Anyone who can shed some light on this? Cheers! Siebrand [1] http://www.ohloh.net/p/mediawiki/analyses/latest [2] http://www.ohloh.net/p/mediawiki

6 8

Re: [Wikitech-l] You too can clean out the tons of database Default Messages
by jidanni＠jidanni.org 28 Mar '09

28 Mar '09

Allow me to forward to the list non-subscriber Suresh's reply: >>>>> "S" == Suresh Ramasubramanian <suresh(a)hserus.net> writes: S> I'm not particularly short of disk space or memory, thanks. But as S> Dan mentions, it does sound like a needless waste - and the volume S> of dud entries is certainly going to scale far higher up when you S> try it on, say, wikipedia.org or mediawiki.org S> srs

1 0

Re: [Wikitech-l] You too can clean out the tons of database Default Messages
by jidanni＠jidanni.org 28 Mar '09

28 Mar '09

I asked my pal about his small wiki http://www.hserus.net/wiki/index.php/Main_Page . He has even more of those rows, revolving uselessly on his disks... >>>>> "S" == Suresh Ramasubramanian <suresh(a)hserus.net> writes: S> Interesting mysql> SELECT COUNT(*) FROM archive WHERE ar_namespace = 8 AND ar_user_text ='MediaWiki default'; S> +----------+ S> | COUNT(*) | S> +----------+ S> | 1796 | S> +----------+ mysql> SELECT COUNT(*) FROM logging WHERE log_namespace = 8 AND log_comment = 'No longer required'; S> +----------+ S> | COUNT(*) | S> +----------+ S> | 1638 | S> +----------+

2 2

You too can clean out the tons of database Default Messages
by jidanni＠jidanni.org 28 Mar '09

28 Mar '09

Gentlemen, if your personal Mediawiki wiki has been around since early 2007, you might want to clean out the thousands of Mediawiki: namespace rows that were left in the database by maintenance/deleteDefaultMessages.php . Wouldn't it make you feel good to clean out thousands of wasted rows, leaving behind e.g., on a small wiki, perhaps just a few hundred rows that are actually related to us? I don't know why the design decision was made to just leave those Mediawiki: namespace items sitting in the archive and text tables. But OK, we proceed to clean them out by hand. I hope I got this right: $ mysqlshow --count myDatabase > before.txt $ mysql myDatabase SELECT COUNT(*) FROM archive WHERE ar_namespace = 8 AND ar_user_text = 'MediaWiki default'; COUNT(*) 1518 DELETE FROM archive WHERE ar_namespace = 8 AND ar_user_text = 'MediaWiki default'; $ php purgeOldText.php --purge Purge Old Text Searching for active text records in revisions table...done. Searching for active text records in archive table...done. Searching for inactive text records...done. 1518 inactive items found. Deleting...done. $ mysql myDatabase SELECT COUNT(*) FROM logging WHERE log_comment = 'No longer required' AND log_namespace = 8; COUNT(*) 1510 SELECT MIN(log_timestamp),MAX(log_timestamp) FROM logging WHERE log_comment = 'No longer required' AND log_namespace = 8; MIN(log_timestamp) MAX(log_timestamp) 20070226185326 20070226194040 DELETE FROM logging WHERE log_comment = 'No longer required' AND log_namespace = 8; $ mysqlshow --count myDatabase|diff before.txt -|sed '/|/!d' < | archive | 15 | 2206 | > | archive | 15 | 688 | < | logging | 10 | 2597 | > | logging | 10 | 1087 | < | text | 3 | 4466 | > | text | 3 | 2948 |

11 15

parallel bzip2 (de)compression of the dump
by ERSEK Laszlo 28 Mar '09

28 Mar '09

Hi, after reading the following sections: http://wikitech.wikimedia.org/view/Data_dump_redesign#Follow_up http://en.wikipedia.org/wiki/Wikipedia_database#Dealing_with_compressed_fil… http://meta.wikimedia.org/wiki/Data_dumps#bzip2 http://www.mediawiki.org/wiki/Mwdumper#Usage http://www.mediawiki.org/wiki/Dbzip2#Development_status and skimming the January, February and March archives of this year (all of which may be outdated and/or incomplete, and then I'll sound like an idiot), I'd like to say the following: ** 1. If the export process uses dbzip2 to compress the dump, and dbzip2's MO is to compress input blocks independently, then to bit-shift the resulting compressed blocks (= single-block bzip2 streams) back into a single multi-block bzip2 stream, so that the resulting file is bit-identical to what bzip2 would produce, then the export process wastes (CPU) time. Bunzip2 can decompress concatenated bzip2 streams. In exchange for a small size penalty, the dumper could just concatenate the single-block bzip2 streams, saving a lot of cycles. ** 2. If dump.bz2 was single-block, many-stream (as opposed to the current many-block, single-stream), then people on the importing end could speed up *decompression* with pbzip2. ** 3. Even if dump2.bz2 stays single-stream, *or* it becomes multi-stream *but* is available only from a pipe or socket, decompression can still be sped up by way of lbzip2 (which I wrote, and am promoting here). Since it's written in strict adherence to the Single UNIX Specification, Version 2, it's available on Cygwin too, and should work on the Mac. Dependent on the circumstances (number of cores, availability of dump.bz2 from a regular file or just a pipe, etc) different bunzip2 implementations are best. For example, on my dual core desktop, even 7za e -tbzip2 -so dump.bz2 performs best in some cases (which -- I guess -- parallelizes the different stages of the decompression). For my more complete analysis (with explicit points on (my imagination of) dbzip2), please see http://lists.debian.org/debian-mentors/2009/02/msg00135.html ** 4. Thanassis Tsiodras' offline reader, available under http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html uses, according to section "Seeking in the dump file", bzip2recover to split the bzip2 blocks out of the single bzip2 stream. The page states This process is fast (since it involves almost no CPU calculations While this may be true relative to other dump-processing operations, bzip2recover is, in fact, not much more than a huge single threaded bit-shifter, which even makes two passes over the dump. (IIRC, the first pass shifts over the whole dump to find bzip2 block delimiteres, then the second pass shifts the blocks found previously into byte-aligned, separate bzip2 streams.) Since lbzip2's multiple-workers decompressor distributes the search for bzip2 block headers over all cores, a list of bzip2 block bit positions (or the separate files themselves) could be created faster, by hacking a bit on lbzip2 (as in "print positions, omit decompression"). Or dbzip2 itself could enable efficient seeking in the compressed dump by saving named bit positions in a separate text file. -o- My purpose with this mail is two-fold: - To promote lbzip2. I honestly believe it can help dump importers. I'm also promoting, with obviously less bias, pbzip2 and 7za, because in some decompression situations they beat lbzip2, and I feel their usefulness isn't emphasized enough in the links above. (If parallel decompression for importDump.php and/or MWDumper is a widely solved problem, then I'm sorry for the noise.) - To ask a question. Can someone please describe the current (and planned) way of compressing/decompressing the dump? (If I'd had more recent info on this, perhaps I wouldn't have bothered the list with this post. I'm also just plain curious.) Thanks, lacos http://phptest11.atw.hu/ http://lacos.web.elte.hu/pub/lbzip2/

6 13

Help with...different...Wiki request solutions
by David Di Biase 27 Mar '09

27 Mar '09

Hi there, I was asked to setup a MediaWiki but there are some strange (though obvious in their case) requests that have been made which I'm not sure what the best solution for integration is. Here they are briefly: - They have a list of authors and animators and would like to sort them in a category by last name first. All the articles currently created begin with first and last. I realised that there is a transclusion option called {{DEFAULTSORT:}} which lets you define its sort value. The problem is, I'm not sure how to invert and display the name on the category page with the inverse. http://meta.wikimedia.org/wiki/Help:Category doesn't tell me much other than about the default sort value. - The Wiki is going to be based on a massive bibliographical system. I've searched high and low for good biblio extensions, but they pretty much do not do what we want. What I've decided to do is have the bibliography entries listed on each Article under a Bibliography header. I am writing an extension which finds all pages with a similar header and parses all the bibliographical entries, which are listed in MLA format. I'm wondering if there is a much better way of doing this, because then I have to consolidate duplicate entries by using a lot of regex (ack). That's about it for now, there are other minor things, but before I bother the mailing list I'm going to do my research and figure them out with the help files. For now these are the only things I can't seem to solve on my own. Regards, David

6 10

Google Summer of Code needs you... to mentor student projects!
by Brion Vibber 26 Mar '09

26 Mar '09

We're a mentoring organization for the Google Summer of Code again this year, and we're dead set on making it our awesomest summer ever! One key thing though is making sure that students and potential students have access to a mentor who can answer their questions and just help steer them into becoming an active member of our development community. If you're an experienced MediaWiki developer and would like to help out with selecting and mentoring student projects, please give us a shout! We'll take you even if you live in the southern hemisphere. ;) We need folks who'll be available online fairly regularly over the summer and are knowledgeable about MediaWiki -- not necessarily knowing every piece of it, but knowing where to look so you can help the students can help themselves. If you're interested, don't forget to apply soon! Student submissions will complete next week and we'll need to start selecting then... http://socghop.appspot.com/org/apply_mentor/google/gsoc2009 -- brion

1 0

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l March 2009