Wikitech-l January 2008

wikitech-l@lists.wikimedia.org

105 participants
121 discussions

Links on a page
by Janani 29 Jan '08

29 Jan '08

Hi, Can anyone point me to the in-links and out-links from a page as per the wiki-database (which has been downloaded from the wiki dumps ). Thank you in advance!

2 1

one week of fun
by Domas Mituzas 29 Jan '08

29 Jan '08

Hi, Just wanted to share some of bits we've been doing this week - it was hopping around and analyzing our performance and application workflow from multiple sides (kind of "Hello 2008!!!" systems performance review). It all started with application object cache - the caching arena was bumped up from 55GB to 160G - and here more work had to be done to make our parser output cacheable. Any use of magic words (and most templates do use them) would decrease cache TTLs to 1 hour, so vast increase in caching space didn't help much. Though, once this was fixed, pages are reparsed just once few days. Additionally, we did move the revision text caching for external storages to a global pool, instead of maintaining local caches on each of these nodes. That allows to reuse old external store box memory space for caching more actively fetched revisions, instead those archived ones. Another major review was done on extension loading - there by delaying or eliminating expensive initializations, especially for very-rarely-used extensions (relatively :) - we did shave at least 20ms off site base loading time (and service request average). That also resulted in huge CPU use reduction. Here special thanks goes to folks on #mediawiki (Aaron, Nikerabbit, siebrand, Simetrical, and others) who joined this effort of analysis, education and engineering :) There're still more difficult extensions to handle, but I hope they will evolve into more adaptive performance-wise. This was long-time regression caused by increasing quality of translations - that resulted in bigger data set to handle at every page load. A small bit, but noticeable, was simplification of mediawiki:pagecategories message on en.wikipedia.org. Such simple logic like "show Category: if there is just one category, and Categories: otherwise" needs a parser to be used, which invokes lots and lots of overhead for every page served. Those few milliseconds needed for that absolutely grammatically correct label could be counted in thousands of dollars. :) There were few other victims in this unequal fight. TitleBlacklist didn't survive the performance audit, - the current architecture of this feature is doing work in places it never should do, and as initial performance guidelines for it were not followed, it got disabled for a while. Also some of CentralNotice functionality was not optimized for work it was used after the fundraiser, so for now this feature is disabled. Of course, these features will be enabled - but they just need more work before they can run live. On another front - in software core part - database connection flow was reviewed - and few adjustments were made, which reduce master server load quite a bit, as well as less communication is done with all database servers (transaction coordination was too verbose before - now it is far more lax). Here again, some of application flow still is irrational - and might have quite a bit of refactoring/fixing in future. Tim pointed out that my knowledge of xdebug profiler is seriously outdated (my mind was stuck at 2.0.1 features, where 2.0.2 introduced quite significant changes that make life easier) ;-) Another shocking revelation was that CPU microbenchmarks provided by MediaWiki internal profiler were not accurate at all - the getrusage() call we use provides information rounded at 10ms each - and most of functions execute far faster than that. It was really amusing, that I trusted numbers, which were similar to rational and reasonable ones only because of huge profiling scale and eventual statistical magic. This a bit complicates profiling in general - as there's no easy way to determine which wait happened because of i/o blocking or context switches. Few images from the performance analysis work: http://flake.defau.lt/mwpageview.png http://flake.defau.lt/mediawikiprofile.png (somewhere here you should see why TitleBlacklist died) This one made me giggle: http://flake.defau.lt/mwmodernart.png Tim was questioning here if people are using wikitext for scientific calculations, or was that just another crazy over-templating we are used to see. Such templates as Commons' 'picture of the day' one cause such output =) Actually - the new parser code makes far nicer graphs (at least, from performance engineering perspective). And one of biggest changes happened on our Squid caching layer - because of how different browsers request data, we generally had different cache sets for IE, Firefox, Opera, Googlebot, KHTML, etc. Now we do normalize the 'accept encoding' specified by browsers, what makes most of connections fall into single class. In theory this may at least double our caching efficiency. In practice, we will see - the change has been live just on one cluster just for few hours. As a side effect we turned off 'refresh' button on your browsers. Sorrty - please let us know if anything is seriously wrong with that (if you feel offended about your constitutional refreshing rights - use purge instead :) Additionally I've heard there has been quite a bit of development in new parser, as well as networking in Amsterdam ;-) Quite a few people also noticed the huge flamewar of 'oh noes, dev enabled a feature despite our lack of consensus' . Now we're sending people to board for all the minor changes they ask for :-) Oh, and Mark changed the scale on our 'backend service time' graph, which is used to measure our health and performance - now the upper limit is at 0.3s (used to be our minimum few years ago) instead of old 1s: http://www.nedworks.org/~mark/reqstats/svctimestats-weekly.png So, that much of fun we've seen this week in site operations :) Cheers, Domas P.S. I'll spend next week in Disneyworld instead ;-)~~

11 26

dead squids?
by Steve Summit 28 Jan '08

28 Jan '08

(I mentioned this on IRC just now, but other than a "me, too", there was no response, so I'm posting here for posterity.) Periodically today I've gotten utterly blank pages -- perhaps 1 or 2% of the time. I wonder if there's a squid or two that's dead or acting up?

3 2

clearly HTTP 404 but you say 200 OK
by jidanni＠jidanni.org 28 Jan '08

28 Jan '08

Gentlemen, it is you who are ruining network standards. HEAD http://en.wikipedia.org/wiki/Some_Non_Existent_Page --> 200 OK It is clearly a case of 404 Not Found. You can still give the same "You can create this article" message AND return a truthful HTTP code. Else how is one to use a linkchecker to your links? Why are MediaWiki wikis special? Yes, 200 OK for action=edit, and disambiguation pages, but not for the basic clear case of the spirit of 404 Not Found. What if all ==External links== always returned 200? How could a bot detect linkrot? Do unto others...

9 17

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [30122] trunk/phase3
by Simetrical 28 Jan '08

28 Jan '08

On Jan 24, 2008 10:35 AM, <huji(a)svn.wikimedia.org> wrote: > + // Entry from drop down menu + additional comment > + $reason .= ': ' . $this->DeleteReason; Should the ': ' string be localized (below as well as here)? Also, as far as I can tell this will result in summaries like: Deleted old revision $1: Boilerplate reason: Custom reason Having two colons is a bit odd. > + $mDeletereasonother = Xml::label( wfMsg( 'filedelete-otherreason' ), 'wpReason' ); > + $mDeletereasonotherlist = wfMsgHtml( 'filedelete-reason-otherlist' ); > + $scDeleteReasonList = wfMsgForContent( 'filedelete-reason-dropdown' ); > + $mDeleteReasonList = ''; > + $delcom = Xml::label( wfMsg( 'filedelete-comment' ), 'wpDeleteReasonList' ); It seems incredibly confusing to use local variables whose names begin with $m. An initial lowercase 'm' prefix is used to indicate member variables. Either these should be made member variables, or the 'm' should be dropped. You also have variables that are named identically except for the 'm' ($deleteReasonList vs. $mDeleteReasonList), which is even more confusing. filedelete-comment and filedelete-otherreason seem to allow arbitrary HTML. > + $value = trim( htmlspecialchars($option) ); Consider not escaping ampersands here, so that entities can be used -- really we only want to ban tags. > + } elseif ( substr( $value, 0, 1) == '*' && substr( $value, 1, 1) != '*' ) { > + // A new group is starting ... > + $value = trim( substr( $value, 1 ) ); > + $deleteReasonList .= "$optgroup<optgroup label=\"$value\">"; > + $optgroup = "</optgroup>"; > + } elseif ( substr( $value, 0, 2) == '**' ) { It would probably be simpler to read if you reversed these two elseifs. Then you could drop the second part of the (current) first one's condition, and just check substr( $value, 0, 1 ) == '*'. Also, maybe a clearer name for $optgroup? Like $close or $closeoptgroup or something? The current one is okay, I guess. > + if ( $mDeleteReasonList === $value) > + $selected = ' selected="selected"'; Indentation is wrong here. The second line should be indented by one more tab.

3 5

Adam Rinkleff on moderation
by David Gerard 28 Jan '08

28 Jan '08

A. Rinkleff is the user previously known as User:Lir, a legendary Wikipedia troll. I've placed him on moderation pre-emptively, because he's the sort of person who really warrants it. THE TROLL FROM HELL. - d.

3 3

Wikimedia hiring a software developer and IT guy in San Francisco
by Brion Vibber 27 Jan '08

27 Jan '08

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hey all you Wikimedians and MediaWikians -- We're now hiring for a software development position at our new San Francisco office. This is an entry-level position, but existing experience with MediaWiki or other LAMP-style development will be a big help. Since we're still a small office, the new guy will also need to help people in the office with basic IT issues; we have a mixed environment with Mac, Linux, and Windows machines, and varying degrees of tech-savvy. We're planning to open up a couple more dev positions over the coming months as budget allows; those will not necessarily be locked to the California office, but for now we need someone who can be on-site every day to lend a hand. Fuller job description at: http://wikimediafoundation.org/wiki/Job_openings/Software_Developer_/_IT_Su… Drop me a mail (offlist!) if you're interested, or pass it on if you know someone who will be; we'll be scheduling interviews in the next few weeks. (And yes, we will be posting the position on Craigslist as well.) - -- brion vibber (brion @ wikimedia.org) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHmpmNwRnhpk1wk44RAnr+AKDTyokaG7YpBWBvK3X4vfMzYj9RDQCaA1WY igzwrxdYL6/Jt8czqAmro4k= =T3Hr -----END PGP SIGNATURE-----

2 1

Internal representation
by Steve Bennett 27 Jan '08

27 Jan '08

I was just reading this: http://www.riehle.org/wp-content/uploads/2008/01/a5-junghans.pdf And wondering if there is any desire (let alone plans) to move to a system of storing a different internal representation (eg, XML) and separating the display logic out. One obvious benefit would be making it easier to produce different outputs without having to write multiple parsers. Are there others? Would Wikipedia benefit from supporting an interchange format? Just fishing. Steve

4 10

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [30162] trunk/phase3/includes/SpecialUpload.php
by Brion Vibber 26 Jan '08

26 Jan '08

minuteelectron(a)svn.wikimedia.org wrote: > Revision: 30162 > Author: minuteelectron > Date: 2008-01-26 00:37:42 +0000 (Sat, 26 Jan 2008) > > Log Message: > ----------- > Fix bug 9246 by watching a page when the upload\reupload "Watch this > page" checkbox is checked and unwatching a page when it is not. A problem with this is that it *un*watches a previously watched image under the following circumstances: * 'watch pages I edit' is not enabled (eg, default state) * go to Special:Upload and select the file * hit 'upload' The initial check state is unchecked (since there was no initial destination name set), and this doesn't get updated to reflect the existing watch state of the previous image. There are a couple possible ways around this. One is to compare the form's actual initial check state with the submitted check state and only apply an unwatch if there was a difference. Another might be to do a watch state update via AJAX when a new destination filename is set in the form. This would allow the checkmark's default state to be set 'properly' for those with JS enabled in modern browsers. Perhaps a combination should be used. For now I'll revert the change, as I think not unwatching things is less destructive than unwatching things unexpectedly. -- brion vibber (brion @ wikimedia.org)

1 0

Bug 12681: should the change be applied?
by Huji 25 Jan '08

25 Jan '08

Hello all, Please take a few seconds to have a look at http://bugzilla.wikimedia.org/show_bug.cgi?id=12681 and help with your comments about whether we should apply this change or not. Applying it will prevent people from spoofing the "new message" alert, but at the same time, will make the new message bar to appear where it never appeared before, which may not be desired. Thanks in advance, Hojjat (aka Huji)

7 9

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l January 2008