On wikipedia the MAX_INCLUDE_REPEAT limit (=5) in Parser.php is problematic. I propose to change the sens of this constraint from "max template inclusions" to "max different template inclusions".
To solve the memory usage problem, we could use a tempory table containing the previously loaded templates in the article ? So, each template replicated x time in an article would generate only one SQL request.
An opinion ?
Emmanuel Engelhart
Emmanuel Engelhart wrote:
On wikipedia the MAX_INCLUDE_REPEAT limit (=5) in Parser.php is problematic. I propose to change the sens of this constraint from "max template inclusions" to "max different template inclusions".
To solve the memory usage problem, we could use a tempory table containing the previously loaded templates in the article ? So, each template replicated x time in an article would generate only one SQL request.
An opinion ?
Emmanuel Engelhart
Please understand "I propose to change the sens" as "I propose to change the meaning".
Is there already a plan to remove this constraint ?
-- Looxix
Please stop generating sequences as follows (URL:http://de.wikipedia.org/wiki/John_Ruskin):
<p><br /></p> <p><br /></p>
If vertical whitespace is required (why?), use CSS properties. Unfortunately, browsers do not treat those sequences as empty...
Karl Eichwalder ke@gnu.franken.de writes:
Please stop generating sequences as follows (URL:http://de.wikipedia.org/wiki/John_Ruskin):
<p><br /></p> <p><br /></p>
If vertical whitespace is required (why?), use CSS properties. Unfortunately, browsers do not treat those sequences as empty...
FYI: This annoying bug is still visible :-(
On Sat, Jul 24, 2004 at 11:24:26AM +0200, Karl Eichwalder wrote:
Karl Eichwalder ke@gnu.franken.de writes:
Please stop generating sequences as follows (URL:http://de.wikipedia.org/wiki/John_Ruskin):
<p><br /></p> <p><br /></p>
If vertical whitespace is required (why?), use CSS properties. Unfortunately, browsers do not treat those sequences as empty...
FYI: This annoying bug is still visible :-(
This is strange. I use Mozilla for the following.
As an anonymous user, when viewing the entire page source, the p/br/p sequence appears after the weblinks:
<h2>Weblinks</h2> <ul> <li><a href="http://www.lancs.ac.uk/users/ruskin/default.htm" class='external' title= "http://www.lancs.ac.uk/users/ruskin/default.htm">http://www.lancs.ac.uk/users/ruskin/default.htm</a></li> </ul>
<p><br /></p> <p><br /></p>
Selecting everything from Weblinks until the categories, and clicking "view selection source", I get this, not including p/br/b:
<h2>Weblinks</h2> <ul> <li><a href="http://www.lancs.ac.uk/users/ruskin/default.htm" class="external" title="http://www.lancs.ac.uk/users/ruskin/default.htm">http://www.lancs.ac.uk/users/ruskin/default.htm</a></li>
</ul>
<div id="catlinks"><p class="catlinks"><a href="/w/wiki.phtml?title=Spezial:Categories&article=John_Ruskin" title="Spezial:Categories">Eingeordnet unter</a>: <a href="/wiki/Kategorie:Brite" title="Kategorie:Brite">Brite</a> | <a href="/wiki/Kategorie:Maler" title="Kategorie:Maler">Maler</a> | <a href="/wiki/Kategorie:Englischsprachiger_Schriftsteller" title="Kategorie:Englischsprachiger Schriftsteller">Englischsprachiger Schriftsteller</a>
Using Lynx to get the source (as anonymous user, no cookies set, I get the later version, without p/br/p sequence.
Regards,
JeLuF
On Sun, 25 Jul 2004, Jens Frank wrote:
As an anonymous user, when viewing the entire page source, the p/br/p sequence appears after the weblinks:
<h2>Weblinks</h2> <ul> <li><a href="http://www.lancs.ac.uk/users/ruskin/default.htm" class='external' title= "http://www.lancs.ac.uk/users/ruskin/default.htm">http://www.lancs.ac.uk/users/ruskin/default.htm</a></li> </ul>
<p><br /></p> <p><br /></p>
Using firefox 0.9 as an anonymous user I don't see any of those <p><br /></p> in the page source. Is this some kind of mozilla bug?
Alfio
On Tue, 13 Jul 2004 23:08:08 +0200 Emmanuel Engelhart emmanuel@engelhart.org wrote:
On wikipedia the MAX_INCLUDE_REPEAT limit (=5) in Parser.php is problematic. I propose to change the sens of this constraint from "max template inclusions" to "max different template inclusions".
To solve the memory usage problem, we could use a tempory table containing the previously loaded templates in the article ? So, each template replicated x time in an article would generate only one SQL request.
An opinion ?
As Attachment, a very simple patch which implements this idea.
Emmanuel Engelhart
On Wed, Jul 14, 2004 at 10:58:20PM +0200, Emmanuel Engelhart wrote:
On Tue, 13 Jul 2004 23:08:08 +0200 Emmanuel Engelhart emmanuel@engelhart.org wrote:
On wikipedia the MAX_INCLUDE_REPEAT limit (=5) in Parser.php is problematic. I propose to change the sens of this constraint from "max template inclusions" to "max different template inclusions".
To solve the memory usage problem, we could use a tempory table containing the previously loaded templates in the article ? So, each template replicated x time in an article would generate only one SQL request.
An opinion ?
As Attachment, a very simple patch which implements this idea.
If article [[X]] includes [[Template:A]] which includes [[Template:B]], which again includes [[Template:A]], this would result in a memory bomb using only two different templates.
Caching templates is a good idea when keeping track of the used memory. Before we do so, the number of templates used must currently be restricted to avoid endless loops.
Regards,
JeLuF
On Tue, 13 Jul 2004 23:08:08 +0200 Emmanuel Engelhart emmanuel@engelhart.org wrote:
On wikipedia the MAX_INCLUDE_REPEAT limit (=5) in Parser.php is problematic. I propose to change the sens of this constraint from "max template inclusions" to "max different template inclusions".
To solve the memory usage problem, we could use a tempory table containing the previously loaded templates in the article ? So, each template replicated x time in an article would generate only one SQL request.
On Wed, 14 Jul 2004 23:46:35 +0200, Jens Frank jeluf@gmx.de wrote:
If article [[X]] includes [[Template:A]] which includes [[Template:B]], which again includes [[Template:A]], this would result in a memory bomb using only two different templates.
I think Emmanuel's suggestion is more akin to PHP's include_once() and thus wouldn't be vulnerable to the type of problem you're describing.
-Bill Clark
On Wed, Jul 14, 2004 at 05:53:41PM -0400, Bill Clark wrote:
On Tue, 13 Jul 2004 23:08:08 +0200 Emmanuel Engelhart emmanuel@engelhart.org wrote:
On wikipedia the MAX_INCLUDE_REPEAT limit (=5) in Parser.php is problematic. I propose to change the sens of this constraint from "max template inclusions" to "max different template inclusions".
To solve the memory usage problem, we could use a tempory table containing the previously loaded templates in the article ? So, each template replicated x time in an article would generate only one SQL request.
On Wed, 14 Jul 2004 23:46:35 +0200, Jens Frank jeluf@gmx.de wrote:
If article [[X]] includes [[Template:A]] which includes [[Template:B]], which again includes [[Template:A]], this would result in a memory bomb using only two different templates.
I think Emmanuel's suggestion is more akin to PHP's include_once() and thus wouldn't be vulnerable to the type of problem you're describing.
Emmanuel and I have tested it: It is vulnerable to this approach.
Regards,
JeLuF
On Thu, 15 Jul 2004 07:23:20 +0200, Jens Frank jeluf@gmx.de wrote:
Emmanuel and I have tested it: It is vulnerable to this approach.
Then it's implemented wrong. The idea Emmanuel described (keeping track of which templates have already been included so as to keep from including them again) should work. I can't tell from looking at the patch what the problem might be, but there must be some silly mistake we're not seeing. Is the "temporary table" a bit too temporary, maybe?
I just downloaded the phase3 code (I'd been using the 1.2.6 stable release) and am poking through it now... not that I'm probably going to be able to make heads or tails of it all for quite some time still (not really my style of coding, and I still have such a poor grasp of the "big picture" here with how most of the modules interface with each other).
-Bill Clark
Bill Clark wrote:
On Wed, 14 Jul 2004 23:46:35 +0200, Jens Frank jeluf@gmx.de wrote:
If article [[X]] includes [[Template:A]] which includes [[Template:B]], which again includes [[Template:A]], this would result in a memory bomb using only two different templates.
I think Emmanuel's suggestion is more akin to PHP's include_once() and thus wouldn't be vulnerable to the type of problem you're describing.
That's what we already do. MAX_INCLUDE_REPEAT is a maximum for inclusions of a given template, there's no limit on the number of different templates included. So if A included B and B included A, you'd get 5 copies of each: A->B->A->B->A->B->A->B->A->B->link to A.
This is unacceptable because people want to use templates for things which are included many times in a single page, like superscript "Le" on fr, or fancy bullet points.
-- Tim Starling
On Thu, 15 Jul 2004 15:50:53 +1000, Tim Starling ts4294967296@hotmail.com wrote:
That's what we already do. MAX_INCLUDE_REPEAT is a maximum for inclusions of a given template, there's no limit on the number of different templates included. So if A included B and B included A, you'd get 5 copies of each: A->B->A->B->A->B->A->B->A->B->link to A.
I thought Emmanuel was suggesting that A and B each only be included once.
This is unacceptable because people want to use templates for things which are included many times in a single page, like superscript "Le" on fr, or fancy bullet points.
Ah. I see. So what I thought Emmanuel was suggesting wouldn't be any good anyway.
-Bill Clark
On Thu, 15 Jul 2004 15:50:53 +1000, Tim Starling ts4294967296@hotmail.com wrote:
This is unacceptable because people want to use templates for things which are included many times in a single page, like superscript "Le" on fr, or fancy bullet points.
How about a MAX_INCLUDE_DEPTH then, instead?
-Bill Clark
Bill Clark wrote:
On Thu, 15 Jul 2004 15:50:53 +1000, Tim Starling ts4294967296@hotmail.com wrote:
This is unacceptable because people want to use templates for things which are included many times in a single page, like superscript "Le" on fr, or fancy bullet points.
How about a MAX_INCLUDE_DEPTH then, instead?
The vulnerability with inclusion was pointed out when we had a hard-coded maximum depth of 1. As I explained in a comment in the source file:
#---------------------------------------- # Variable substitution O(N^2) attack #----------------------------------------- # Without countermeasures, it would be possible to attack the parser by saving a page # filled with a large number of inclusions of large pages. The size of the generated # page would be proportional to the square of the input size. Hence, we limit the number # of inclusions of any given page, thus bringing any attack back to O(N). #
-- Tim Starling
On Thu, 15 Jul 2004 16:11:16 +1000, Tim Starling ts4294967296@hotmail.com wrote:
The vulnerability with inclusion was pointed out when we had a hard-coded maximum depth of 1. As I explained in a comment in the source file:
#---------------------------------------- # Variable substitution O(N^2) attack #----------------------------------------- # Without countermeasures, it would be possible to attack the parser by saving a page # filled with a large number of inclusions of large pages. The size of the generated # page would be proportional to the square of the input size. Hence, we limit the number # of inclusions of any given page, thus bringing any attack back to O(N). #
Hmm.. the problem would seem insurmountable then (barring some sort of checking on the size of includes, which seems like more trouble than its worth).
Oh well.
-Bill Clark
Bill Clark wrote:
Hmm.. the problem would seem insurmountable then (barring some sort of checking on the size of includes, which seems like more trouble than its worth).
Oh well.
We've discussed all this before, a couple of times. It's not insurmountable.
http://mail.wikipedia.org/pipermail/wikitech-l/2004-June/010701.html
-- Tim Starling
On Thu, 15 Jul 2004 16:50:12 +1000, Tim Starling ts4294967296@hotmail.com wrote:
We've discussed all this before, a couple of times. It's not insurmountable.
http://mail.wikipedia.org/pipermail/wikitech-l/2004-June/010701.html
When I read Emmanuel's first post, I thought MAX_INCLUDE_REPEAT limit was set solely in order to prevent infinite loops in inclusions.
Then you informed me that it was also intended to prevent attacks based on including a large number of large files (or more accurately, including a limited number of large files multiple times).
I didn't see any way to prevent that kind of attack without either:
1) Checking the size before inclusion.
2) Limiting the number of inclusions (or at least making it more difficult by limiting the number of times the same file can be included, thus forcing attackers to create multiple large templates, which is easier to track and/or prevent).
I didn't think (1) was feasible (but see below) so that meant we're stuck with (2), which also prevents the inclusion of a large number of small files (such as the fancy bullets you mentioned). That's why I thought the problem was insurmountable; it seemed to be intrinsically tied to the number of repeated includes. Block repeated includes for large files, and you also have to block them for small files.
Maybe the type of approach in (1) isn't as bad as I thought though, since we could just keep a running total of how much data has been included, and block further inclusions once a limit has been reached. MAX_CUMULATIVE_INCLUDE_SIZE?
Actually, that might not be a half bad idea, since it would solve the infinite inclusion problem as well (though MAX_INCLUDE_DEPTH might still be a better way to handle that problem, since it might take too long for small looping templates to build up to the size limit).
-Bill Clark
On Thu, 15 Jul 2004 03:17:59 -0400, Bill Clark wclarkxoom@gmail.com wrote:
Maybe the type of approach in (1) isn't as bad as I thought though, since we could just keep a running total of how much data has been included, and block further inclusions once a limit has been reached. MAX_CUMULATIVE_INCLUDE_SIZE?
Wait... that's exactly what you said in the post you linked, isn't it? :)
I mis-read it the first time, and thought you meant a 2MB limit on each included file.
I think the MAX_INCLUDE_DEPTH solution might preferable to checking for loops directly though, since that could end up getting very complicated if the cycle is particularly long (or else end up being too restrictive, depending on how it was implemented).
-Bill Clark
Selon Bill Clark wclarkxoom@gmail.com:
- Checking the size before inclusion.
I'm coding a patch in this way, which prevents the inclusion of more than MAX_TEMPLATE_INCLUSION_CHAR.
- Limiting the number of inclusions (or at least making it more
difficult by limiting the number of times the same file can be included, thus forcing attackers to create multiple large templates, which is easier to track and/or prevent).
My opinion, is simply to prevent endless recursive templates (by storing the recursion path).
With, what I already code, the template parsing should be faster, and with less painful limitations.
Emmanuel Engelhart
Jens Frank wrote:
Caching templates is a good idea when keeping track of the used memory. Before we do so, the number of templates used must currently be restricted to avoid endless loops.
The number of templates on a page does not need to be restricted in order to avoid endless loops within templates.
wikitech-l@lists.wikimedia.org