---------- Forwarded message ---------- From: Tim Starling tstarling@wikimedia.org Date: Fri, Sep 21, 2012 at 4:07 AM Subject: [Wikitech-l] #switch limits To: wikitech-l@lists.wikimedia.org
Over the last week, we have noticed very heavy apache memory usage on the main Wikimedia cluster. In some cases, high memory usage resulted in heavy swapping and site-wide performance issues.
After some analysis, we've identified the main cause of this high memory usage to be geographical data ("données") templates on the French Wikipedia, and to a lesser extent, the same data templates copied to other wikis for use on articles about places in Europe.
Here is an example of a problematic template:
https://fr.wikipedia.org/w/index.php?title=Mod%C3%A8le:Donn%C3%A9es_PyrF1-2009&action=edit
That template alone uses 47MB for 37000 #switch cases, and one article used about 15 similarly sized templates.
The simplest solution to this problem is for the few Wikipedians involved to stop doing what they are doing, and to remove the template invocations which have already been introduced. Antoine Musso has raised the issue on the French Wikipedia's "Bistro" and some of the worst cases have already been fixed.
To protect site stability, I've introduced a new preprocessor complexity limit called the "preprocessor generated node count", which is incremented by about 6 for each #switch case. When the limit is exceeded, an exception is thrown, preventing the page from being saved or viewed.
The limit is currently 4 million (~667,000 #switch cases), and it will soon be reduced to 1.5 million (~250,000 #switch cases). That's a compromise which allows most of the existing geographical pages to keep working, but still allows a memory usage of about 230MB.
At some point, we would like to patch PHP upstream to cause memory for DOM XML trees to be allocated from the PHP request pool, instead of with malloc(). But to deploy that, we would need to reduce the limit to the point where the template DOM cache can easily fit in the PHP memory limit of 128MB.
In the short term, we will be working with the template editors to ensure that all articles can be viewed with a limit of 1.5 million. That's not a very viable solution in the long term, so I'd also like to introduce save-time warnings and tracking categories for pages which use more than, say, 50% of the limit, to encourage authors to fix articles without being directly prompted by WMF staff members.
At some point in the future, you may be able to put this kind of geographical data in Wikidata. Please, template authors, wait patiently, don't implement your own version of Wikidata using wikitext templates.
-- Tim Starling
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l