Over the last week, we have noticed very heavy apache memory usage on
the main Wikimedia cluster. In some cases, high memory usage resulted
in heavy swapping and site-wide performance issues.
After some analysis, we've identified the main cause of this high
memory usage to be geographical data ("données") templates on the
French Wikipedia, and to a lesser extent, the same data templates
copied to other wikis for use on articles about places in Europe.
Here is an example of a problematic template:
<https://fr.wikipedia.org/w/index.php?title=Mod%C3%A8le:Donn%C3%A9es_PyrF1-2…>
That template alone uses 47MB for 37000 #switch cases, and one article
used about 15 similarly sized templates.
The simplest solution to this problem is for the few Wikipedians
involved to stop doing what they are doing, and to remove the template
invocations which have already been introduced. Antoine Musso has
raised the issue on the French Wikipedia's "Bistro" and some of the
worst cases have already been fixed.
To protect site stability, I've introduced a new preprocessor
complexity limit called the "preprocessor generated node count", which
is incremented by about 6 for each #switch case. When the limit is
exceeded, an exception is thrown, preventing the page from being saved
or viewed.
The limit is currently 4 million (~667,000 #switch cases), and it will
soon be reduced to 1.5 million (~250,000 #switch cases). That's a
compromise which allows most of the existing geographical pages to
keep working, but still allows a memory usage of about 230MB.
At some point, we would like to patch PHP upstream to cause memory for
DOM XML trees to be allocated from the PHP request pool, instead of
with malloc(). But to deploy that, we would need to reduce the limit
to the point where the template DOM cache can easily fit in the PHP
memory limit of 128MB.
In the short term, we will be working with the template editors to
ensure that all articles can be viewed with a limit of 1.5 million.
That's not a very viable solution in the long term, so I'd also like
to introduce save-time warnings and tracking categories for pages
which use more than, say, 50% of the limit, to encourage authors to
fix articles without being directly prompted by WMF staff members.
At some point in the future, you may be able to put this kind of
geographical data in Wikidata. Please, template authors, wait
patiently, don't implement your own version of Wikidata using wikitext
templates.
-- Tim Starling
I'm writing unit tests for one of Translate classes.
In the setUp I need to create few pages, but I need to also control
the user ids of the revisions. This seems to work well except for two
things:
* dataProvider methods are called *before* setUp, so I cannot use the
user ids I have stored in setUp.
* setUp and tearDown are called for *every* item in the dataProvider.
This seems very wasteful - no wonder the tests takes minutes or so to
run.
This just doesn't make any sense to me. I'm considering to stop using
@dataProvider in this case - any other ideas?
The code in setUp is something like this:
$title = Title::makeTitle( NS_MEDIAWIKI, 'Key1/fi' );
$user = User::newFromName( 'Translate test user 1' );
$user->addToDatabase();
WikiPage::factory( $title )->doEdit( 'trans1', __METHOD__, 0, false, $user );
$this->user1 = $user;
-Niklas
--
Niklas Laxström
Hi,
I want to implement SSO function between Mediawiki and another application.
Now I can login and logout synchronously.
But I have a problem that I find a status called "anonlogin". When it
comes, I will lose my login status.
I don't know how it appears, and howto handle this status.
Can someone help me?
Thanks very much.
Hi,
I was downloading the "Guide to star constellations" in pdf format.
While it is rendered at 91% it throws an error. And said failed to render.
It was working fine in case of .odt file and also .zim file. Please look in
to the matter.
Thanks and regards
--
~Rupe$h Bende
Software Engg
Games24x7 Pvt. Ltd
A little while ago Trevor Parscal changed our jsMessage setup to be a
floating auto-hiding notification bubble.
https://gerrit.wikimedia.org/r/#/c/17605/
The end implementation felt half-baked to me. Since it just swapped text
for notification replacement. And didn't support multiple notifications.
It even reused the same id as the previous message which was pretty much a
completely different concept.
So I spent a night implementing a fully featured notification bubble
system. Something that should work for watchlists, VisualEditor, and
perhaps some other things like LQT, and perhaps anything we want to start
making more dynamic. Same goes for anyone with a good Gadget idea that
could use better notifications.
Here's a demo video of the new notification system:
https://www.mediawiki.org/wiki/File:Mw-notification.ogv
The changeset is https://gerrit.wikimedia.org/r/#/c/19199/
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Niklas, Aaron S., Siebrand, Santhosh, Amir, Alolita had a discussion
about code review. Here are the notes. It might repeat some things
Sumana sent earlier and you might not agree on everything.
Problems:
* Having tags in Gerrit would be nice. Commit summary lines are
sometimes abused for this.
* Changes and rebases are combined.
* We are concerned that dashboards will become private in future.
Currently it's possible to see everyone's dashboards.
* Time is wasted scanning for stuff to review in different Gerrit listings.
* Unowned code rotting around. Result of bad bus factor.
* There is no stylize.php for JavaScript.
* Lack of tools to effectively debug PHP code without
print/var_dump/error_log etc.
Notes:
* Not always testing manually. Depends on how big failure can be and
whether there are good unit tests for it.
* Drafts-feature is (sometimes) good for work-in-progress, but tests
need to be triggered manually by adding Jenkins to reviewers and
triggering new patchset.
* We own/watch certain extensions and review new patches against them.
* JavaScript code needs more documentation than PHP.
* Sadly, automatic code formatting tools are rarely used.
* Post-commit review does not yet have a process, but cases seem to be
relatively rare.
* We love the continous integration systems running our phpunit tests,
looking forward to QUnit tests too.
* Use wfProfilneIn/Out for things like reading or writing to files,
shell calls, http calls.
Our quick tips (what we usually complain in code review)
* Commit should message describe what and why. What was the problem?
How does the fix resolve it? How to test that it actually works?
* Commits should come with unit tests when possible.
* Make methods public/protected/private (think what makes sense).
Don't just make everything public!
* Follow coding style (whitespace is important).
* Spelling mistakes.
* Avoid long functions (100+ lines).
* Document all methods (even if protected) in general. Describe what
they do rather than documenting the obvious @param $title Title Title
object.
* New public methods and classes should have @since tags. Extension
developers will love you.
* Avoid saying "patchset x: ..." in the commit summary, use gerrit
comments instead
* Use type hinting when applicable
* If you need some additional functionality of a core module (or you
need a function that does something similar but a different), actually
improve the core module. Don't just copy+paste and modify the code *
in another place.
* Avoid static "inheritance" (late static binding) unless it's perhaps
for a factory function
* Smaller commits are easier to review
* Extensions that assume direct file system access for shared storage
can't be used on WMF
* Refactor code as changes are made (but use separate commits if the
refactoring is large). Don't let the code keep getting worse with each
change.
-Niklas
--
Niklas Laxström
Hi,
I'm trying to figure out what needs to happen with the remaining extensions
in SVN that have not yet moved to Git (there's 372 of them). I've taken the
time to make up a list of extensions and put them on the wiki, but I need
some help!
Here's the page:
http://www.mediawiki.org/wiki/Git/Conversion/Extensions_still_in_svn
Mainly what I'm looking for is anyone who knows the status of any of these
372 extensions to take a few minutes to fill in ones they know. If it's
abandoned or obsolete, mark it as such so I can ignore it. If you know an
extension's still used, but maybe doesn't have an active maintainer, let's get
it in Git. Some extensions might still not be ready to move yet, but that's
something I'd like to know too.
Thanks for any help you can give. I'll be looking through the list as well,
but I figured crowd-sourcing the task might help us get it done faster.
Have a great Friday everyone,
-Chad
Hi all,
here's our weekly list of Wikidata review items. Due to the hands-on
meeting last week we refrained from sending it earlier. Now that most
should be back home, I wanted to give an overview of the open
items before our telco tomorrow.
* ContentHandler. This one is seriously blocking us now, and we would
need to get it reviewed. It is our highest priority right now. The
review was promised for this week. We are eagerly awaiting the review
and further input. Here's the bug:
<https://bugzilla.wikimedia.org/show_bug.cgi?id=38622>
* Sites. The RFC seems to be stable:
<https://www.mediawiki.org/wiki/Requests_for_comment/New_sites_system>
Chad was reminded one and half weeks ago to take a look, and since no
further input has come in we assume that it is acceptable, which is
why we started with the implementation work. Here's the link to the
patch: <https://gerrit.wikimedia.org/r/#/c/23528/>
* jQuery table sorting improvements. This improves the UI on initial
display of a sorted table. There has been some comments and updates,
thanks to Krinkle for the review and comments. The work is ongoing
here, the ball is in our courtyard, we are working on the new
patchset: <https://gerrit.wikimedia.org/r/#/c/22562/>
* Towards nested transactions (2):
<https://gerrit.wikimedia.org/r/#/c/21584/> open with comments from
Aaron Schulz.
Got merged since last mail:
* userWasLastToEdit improvement.
<https://gerrit.wikimedia.org/r/#/c/22049/> Yay! Thanks to Demon.
* Towards nested transactions (1):
<https://gerrit.wikimedia.org/r/#/c/21582/> got merged! Yay! Thanks to
Aaron Schulz.
Thanks for everyone, especially to Demon, Krinkle, The DJ, Dantman,
Matmarex, and Aaron Schulz for reviewing, and Rob, Tim, and Chad who
participated in the phone conference last week.
It would be crucial to get the first two items off this list as soon
as possible.
Cheers,
Denny
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.