XHTML generation

List overview All Threads
Download

newer

older

interwiki images

Linking to "parent page"

NSK

18 Feb 2005 18 Feb '05

12:24 a.m.

The XHTML above is generated by Wikipedia, but in my opinion MW should use only " and not ' i.e. it should be class="external" instead of 'external'

-- NSK http://portal.wikinerds.org

Show replies by date

Brion Vibber

18 Feb 18 Feb

12:49 a.m.

New subject: [Mediawiki-l] XHTML generation

NSK wrote:

...

<a href="http://portal.wikinerds.org/canada-flag" class='external' title="http://portal.wikinerds.org/canada-flag" rel="nofollow">link</a>

The XHTML above is generated by Wikipedia, but in my opinion MW should use only " and not ' i.e. it should be class="external" instead of 'external'

Thank you for your opinion. It will be dutifully studied, found wanting, and discarded.

For the people in the audience who are interested in accuracy, note that XML allows attribute values to be quoted either with double quotes or single quotes. Here's the formal lexical definition[1]:

AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

Classic SGML-based HTML additionally allows attribute values to be unquoted if they contain only certain characters[2] (eg border=1, but *not* bgcolor=#EEEEEE which is actually illegal!) XHTML limits itself to XML's stricter syntax, so only single and double-quoted attribute values are allowed.

Since strings in PHP source code are themselves usually either single or double-quoted, instances of the same quote character within the string must be escaped with a backslash to appear as a literal character. Convenience for the coder thus often prompts the use of one or the other quote style for XHTML markup being produced from PHP code. Something like "$err" is easier for the coders to read than "$err".

When outputting user-supplied data which is escaped using htmlspecialchars(), generally double-quotes are used as that function's default behavior does transform " to " but does not transform ', requiring additional work to produce a string suitable for literal inclusion in a single-quoted XML attribute.

Both quote forms are equally legal and produce equivalent results, so source code readability tends to outweigh the minor issue of aesthetic consistency in the markup output. Markup output is already butt-ugly because it's not spaced or indented nicely, and nobody's going to look at it very often; it's for consumption of the browser while the source code is maintained by human programmers.

[1] http://www.w3.org/TR/REC-xml/#sec-common-syn [2] http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2

-- brion vibber (brion @ pobox.com)

Frank Wales

1:30 a.m.

New subject: [Mediawiki-l] XHTML generation

...

NSK wrote:

...
The XHTML above is generated by Wikipedia, but in my opinion MW should use only " and not ' i.e. it should be class="external" instead of 'external'

"Should be"? Says who, the Indentation Police?

On Thu, 2005-02-17 at 11:19 -0800, Brion Vibber wrote:

...

Both quote forms are equally legal and produce equivalent results, so source code readability tends to outweigh the minor issue of aesthetic consistency in the markup output. Markup output is already butt-ugly because it's not spaced or indented nicely, and nobody's going to look at it very often; it's for consumption of the browser while the source code is maintained by human programmers.

Exactly. For patients who care more than their browsers do about irrelevant inconsistencies in XHTML, I suggest a course of HTML Tidy: http://tidy.sourceforge.net/

-- Frank Wales [frank@limov.com]

Paul Johnson

8:58 a.m.

New subject: [Mediawiki-l] XHTML generation

On Thursday 17 February 2005 12:00 pm, Frank Wales wrote:

...

Exactly. For patients who care more than their browsers do about irrelevant inconsistencies in XHTML, I suggest a course of HTML Tidy: http://tidy.sourceforge.net/

Forgive me if I misread this thread, but given that MediaWiki *is* XHTML 1.0 Transitional out of the box, what benefit does it bring us to make it no longer XHTML compliant? And what can tidy do that validator.w3.org can't do?

-- Paul Johnson baloo@ursine.ca http://ursine.ca/~baloo/

Brion Vibber

9:14 a.m.

New subject: [Mediawiki-l] XHTML generation

Paul Johnson wrote:

...

On Thursday 17 February 2005 12:00 pm, Frank Wales wrote:

...
Exactly. For patients who care more than their browsers do about irrelevant inconsistencies in XHTML, I suggest a course of HTML Tidy: http://tidy.sourceforge.net/

Forgive me if I misread this thread, but given that MediaWiki *is* XHTML 1.0 Transitional out of the box, what benefit does it bring us to make it no longer XHTML compliant? And what can tidy do that validator.w3.org can't do?

validator.w3.org is a web service where you submit pages and it spits out error messages telling you what's wrong with it if it doesn't validate according to the advertised version of (X)HTML.

HTML Tidy is a program where you hand it some chunk of (X)HTML that may or may not validate properly and it tries to fix it up for you to produce output that will validate if the original contained errors. (It can do other junk too like pretty indentation of nested markup.)

MediaWiki can optionally shell out to tidy as a postprocessing step after wikitext->XHTML conversion, as currently we do not guarantee that output will validate. (We make some effort to make sure output is well-formed, but there are probably still failures in well-formedness too. Validation is much harder, as there are nesting rules and limitations on what attribute values are valid.)

-- brion vibber (brion @ pobox.com)

Frank Wales

9:29 a.m.

New subject: [Mediawiki-l] XHTML generation

On Thu, 2005-02-17 at 19:28 -0800, Paul Johnson wrote:

...

And what can tidy do that validator.w3.org can't do?

HTML Tidy is a command-line tool that accepts an unreasonable number of pretty-printing options, as well as a variety of reformatting and syntactic-diddling features. As it runs on your local computer, you can use it off-line, and (more usefully) you can wrap it in scripts that mangle or wrangle HTML or XHTML files.

This makes it useful when you want to compare ostensibly similar pages in horribly different layouts (say, from old and new versions of software intended to produce equivalent output), as it can reformat them into mechanically or eyeballically comparable forms.

...

Forgive me if I misread this thread, but given that MediaWiki *is* XHTML 1.0 Transitional out of the box, what benefit does it bring us to make it no longer XHTML compliant?

Oh, I certainly wasn't proposing that it be used on the output of MW, heaven forfend. Rather, I was jokingly suggesting it as a palliative to those who twitch at the sight of inconsistent mark-up.

Given that the thread began with a misguided concern about inconsistent attribute quotes, it came to mind since I believe it will generate output where the attributes are all consistently quoted, sorted and generally spiffed up for use in polite company.

-- Frank Wales [frank@limov.com]

Jan Steinman

10:04 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

I'm playing with extensions, but am having trouble seeing the results.

How can I mark an article as "dirty" in an extension, so it will refresh every time?

Here's my extension, using the Unix fortune(6) command, which returns a random fortune each time, but when refreshed in a browser, I get the same one back each time. When I click "edit," I get new results, but when I refresh my browser, I do not. I even turned debugging on in the browser and made sure the browser cache was cleared. (It's taking enough time; it's obviously doing SOMETHING on the server.)

Another odd thing happens: if I use system() or passthru(), the fortune DOES refresh, but it prints at the top of the window, rather than inside the content area. When I tried shell_exec(), it puts the same old fortune in the content area each time!

extensions/Fortune.php <?php $wgExtensionFunctions[] = 'wfFortune';

function wfFortune() { global $wgParser; $wgParser->setHook('fortune', 'renderFortune'); }

function renderFortune($input) { $output = ""; $output .= shell_exec('/sw/bin/fortune'); $output .= ''; return $output; } ?>

:::: We in America today are nearer to the final triumph over poverty than ever before in the history of any land. -- Herbert Hoover, 1928 :::: Jan Steinman http://www.Bytesmiths.com

Muzaffer Ozakca

19 Feb 19 Feb

1:11 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

I have the same problem. There should a HTTP header to invalidate caches so that the browser retrieves the page from the server. I guess, we can look at the code of recent changes and OutputPage.php to find out what needs to be sent. Then it should be OK, I believe, to do $wgOut->whatever() in the extension code to send it.

Muzo

...

-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l- bounces@Wikimedia.org] On Behalf Of Jan Steinman Sent: Thursday, February 17, 2005 11:34 PM To: MediaWiki announcements and site admin list Subject: [Mediawiki-l] Caching is making me crazy...

I'm playing with extensions, but am having trouble seeing the results.

How can I mark an article as "dirty" in an extension, so it will refresh every time?

Here's my extension, using the Unix fortune(6) command, which returns a random fortune each time, but when refreshed in a browser, I get the same one back each time. When I click "edit," I get new results, but when I refresh my browser, I do not. I even turned debugging on in the browser and made sure the browser cache was cleared. (It's taking enough time; it's obviously doing SOMETHING on the server.)

Another odd thing happens: if I use system() or passthru(), the fortune DOES refresh, but it prints at the top of the window, rather than inside the content area. When I tried shell_exec(), it puts the same old fortune in the content area each time!

extensions/Fortune.php

<?php $wgExtensionFunctions[] = 'wfFortune'; function wfFortune() { global $wgParser; $wgParser->setHook('fortune', 'renderFortune'); } function renderFortune($input) { $output = ""; $output .= shell_exec('/sw/bin/fortune'); $output .= ''; return $output; } ?>

:::: We in America today are nearer to the final triumph over poverty than ever before in the history of any land. -- Herbert Hoover, 1928 :::: Jan Steinman http://www.Bytesmiths.com

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

Jan Steinman

1:16 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

I don't think it has anything to do with the browser. It's a server issue. I *am* retrieving the page from the server -- I can see the hit in my logs, and I can see the time it takes to generate the page. (If it's coming from a cache, why does it take so long, anyway? Seems to be the worst of both worlds! :-)

On 18 Feb 2005, at 11:41, Muzaffer Ozakca wrote:

...

I have the same problem. There should a HTTP header to invalidate caches so that the browser retrieves the page from the server. I guess, we can look at the code of recent changes and OutputPage.php to find out what needs to be sent. Then it should be OK, I believe, to do $wgOut->whatever() in the extension code to send it.

Muzo

...
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l- bounces@Wikimedia.org] On Behalf Of Jan Steinman Sent: Thursday, February 17, 2005 11:34 PM To: MediaWiki announcements and site admin list Subject: [Mediawiki-l] Caching is making me crazy...

I'm playing with extensions, but am having trouble seeing the results.

How can I mark an article as "dirty" in an extension, so it will refresh every time?

Here's my extension, using the Unix fortune(6) command, which returns a random fortune each time, but when refreshed in a browser, I get the same one back each time. When I click "edit," I get new results, but when I refresh my browser, I do not. I even turned debugging on in the browser and made sure the browser cache was cleared. (It's taking enough time; it's obviously doing SOMETHING on the server.)

Another odd thing happens: if I use system() or passthru(), the fortune DOES refresh, but it prints at the top of the window, rather than inside the content area. When I tried shell_exec(), it puts the same old fortune in the content area each time!

extensions/Fortune.php

<?php $wgExtensionFunctions[] = 'wfFortune'; function wfFortune() { global $wgParser; $wgParser->setHook('fortune', 'renderFortune'); } function renderFortune($input) { $output = ""; $output .= shell_exec('/sw/bin/fortune'); $output .= ''; return $output; } ?>

:::: We in America today are nearer to the final triumph over poverty than ever before in the history of any land. -- Herbert Hoover, 1928 :::: Jan Steinman http://www.Bytesmiths.com

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

:::: Conflict cannot survive without your participation. -- Wayne Dyer :::: Jan Steinman http://www.Bytesmiths.com/Van

Muzaffer Ozakca

1:22 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

Not sure about that. A normal refresh looks like it really gets the page from the server. I can refresh my page containing the extension doing a hard refresh. In Firefox (on windoze), it's ctrl+f5 or ctrl+refresh button (not sure of the latter), it should be a similar combination in IE, too.

In SpecialRecentChanges.php, I see:

$wgOut->setSquidMaxage( 10 ); if( $s->lastmod && $wgOut->checkLastModified( $s->lastmod ) ){ # Client cache fresh and headers sent, nothing more to do. return; }

I'll try the first line to see if it works.

...

-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l- bounces@Wikimedia.org] On Behalf Of Jan Steinman Sent: Friday, February 18, 2005 2:47 PM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Caching is making me crazy...

I don't think it has anything to do with the browser. It's a server issue. I *am* retrieving the page from the server -- I can see the hit in my logs, and I can see the time it takes to generate the page. (If it's coming from a cache, why does it take so long, anyway? Seems to be the worst of both worlds! :-)

On 18 Feb 2005, at 11:41, Muzaffer Ozakca wrote:

...
I have the same problem. There should a HTTP header to invalidate caches so that the browser retrieves the page from the server. I guess, we can look at the code of recent changes and OutputPage.php to find out what needs to be sent. Then it should be OK, I believe, to do $wgOut->whatever() in the extension code to send it.

Muzo

...
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l- bounces@Wikimedia.org] On Behalf Of Jan Steinman Sent: Thursday, February 17, 2005 11:34 PM To: MediaWiki announcements and site admin list Subject: [Mediawiki-l] Caching is making me crazy...

I'm playing with extensions, but am having trouble seeing the results.

How can I mark an article as "dirty" in an extension, so it will refresh every time?

Here's my extension, using the Unix fortune(6) command, which returns a random fortune each time, but when refreshed in a browser, I get the same one back each time. When I click "edit," I get new results, but when I refresh my browser, I do not. I even turned debugging on in the browser and made sure the browser cache was cleared. (It's taking enough time; it's obviously doing SOMETHING on the server.)

Another odd thing happens: if I use system() or passthru(), the fortune DOES refresh, but it prints at the top of the window, rather than inside the content area. When I tried shell_exec(), it puts the same old fortune in the content area each time!

extensions/Fortune.php

<?php $wgExtensionFunctions[] = 'wfFortune'; function wfFortune() { global $wgParser; $wgParser->setHook('fortune', 'renderFortune'); } function renderFortune($input) { $output = ""; $output .= shell_exec('/sw/bin/fortune'); $output .= ''; return $output; } ?>

:::: We in America today are nearer to the final triumph over poverty than ever before in the history of any land. -- Herbert Hoover, 1928 :::: Jan Steinman http://www.Bytesmiths.com

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

:::: Conflict cannot survive without your participation. -- Wayne Dyer :::: Jan Steinman http://www.Bytesmiths.com/Van

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

Muzaffer Ozakca

1:36 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

Well, the previous solution didn't work. Try calling:

$wgOut->enableClientCache(false);

In your extension code. You have to declare $wgOut global. It seems to work now. I guess, we have to keep in mind that this will create extra work on the server while serving the pages we set the caching off.

...

-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l- bounces@Wikimedia.org] On Behalf Of Muzaffer Ozakca Sent: Friday, February 18, 2005 2:53 PM To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Caching is making me crazy...

Not sure about that. A normal refresh looks like it really gets the page from the server. I can refresh my page containing the extension doing a hard refresh. In Firefox (on windoze), it's ctrl+f5 or ctrl+refresh button (not sure of the latter), it should be a similar combination in IE, too.

In SpecialRecentChanges.php, I see:
    $wgOut->setSquidMaxage( 10 );
    if( $s->lastmod && $wgOut->checkLastModified( $s->lastmod ) ){
            # Client cache fresh and headers sent, nothing more to do.
            return;
    }
I'll try the first line to see if it works.

...
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l- bounces@Wikimedia.org] On Behalf Of Jan Steinman Sent: Friday, February 18, 2005 2:47 PM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Caching is making me crazy...

I don't think it has anything to do with the browser. It's a server issue. I *am* retrieving the page from the server -- I can see the hit in my logs, and I can see the time it takes to generate the page. (If it's coming from a cache, why does it take so long, anyway? Seems to be the worst of both worlds! :-)

On 18 Feb 2005, at 11:41, Muzaffer Ozakca wrote:

...
I have the same problem. There should a HTTP header to invalidate caches so that the browser retrieves the page from the server. I guess, we can look at the code of recent changes and OutputPage.php to find out what needs to be sent. Then it should be OK, I believe, to do $wgOut->whatever() in the extension code to send it.

Muzo

...
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l- bounces@Wikimedia.org] On Behalf Of Jan Steinman Sent: Thursday, February 17, 2005 11:34 PM To: MediaWiki announcements and site admin list Subject: [Mediawiki-l] Caching is making me crazy...

I'm playing with extensions, but am having trouble seeing the

results.

...

...
...
...
How can I mark an article as "dirty" in an extension, so it will refresh every time?

Here's my extension, using the Unix fortune(6) command, which returns a random fortune each time, but when refreshed in a browser, I get the same one back each time. When I click "edit," I get new results, but when I refresh my browser, I do not. I even turned debugging on in

the

...
...
...
browser and made sure the browser cache was cleared. (It's taking enough time; it's obviously doing SOMETHING on the server.)

Another odd thing happens: if I use system() or passthru(), the fortune DOES refresh, but it prints at the top of the window, rather than inside the content area. When I tried shell_exec(), it puts the same old fortune in the content area each time!

extensions/Fortune.php

<?php $wgExtensionFunctions[] = 'wfFortune'; function wfFortune() { global $wgParser; $wgParser->setHook('fortune', 'renderFortune'); } function renderFortune($input) { $output = ""; $output .= shell_exec('/sw/bin/fortune'); $output .= ''; return $output; } ?>

:::: We in America today are nearer to the final triumph over poverty than ever before in the history of any land. -- Herbert Hoover, 1928 :::: Jan Steinman http://www.Bytesmiths.com

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

:::: Conflict cannot survive without your participation. -- Wayne Dyer :::: Jan Steinman http://www.Bytesmiths.com/Van

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

Jan Steinman

2:13 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

That isn't working for me. I don't think the extension is ever getting executed. How can you turn off caching in something that isn't getting run because it's getting taken out of some cache?

I think it's a bug. I'm going to play with it a bit longer, then maybe submit it.

I've also downloaded all the extensions I could come across. I'll see what sort of tricks they're doing.

On 18 Feb 2005, at 12:06, Muzaffer Ozakca wrote:

...

Well, the previous solution didn't work. Try calling:
 $wgOut->enableClientCache(false);
In your extension code. You have to declare $wgOut global. It seems to work now. I guess, we have to keep in mind that this will create extra work on the server while serving the pages we set the caching off.

...
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l- bounces@Wikimedia.org] On Behalf Of Muzaffer Ozakca Sent: Friday, February 18, 2005 2:53 PM To: 'MediaWiki announcements and site admin list' Subject: RE: [Mediawiki-l] Caching is making me crazy...

Not sure about that. A normal refresh looks like it really gets the page from the server. I can refresh my page containing the extension doing a hard refresh. In Firefox (on windoze), it's ctrl+f5 or ctrl+refresh button (not sure of the latter), it should be a similar combination in IE, too.

In SpecialRecentChanges.php, I see:
 $wgOut->setSquidMaxage( 10 );
 if( $s->lastmod && $wgOut->checkLastModified( $s->lastmod ) ){
 # Client cache fresh and headers sent, nothing more 
to do. return; }

I'll try the first line to see if it works.

...
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l- bounces@Wikimedia.org] On Behalf Of Jan Steinman Sent: Friday, February 18, 2005 2:47 PM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Caching is making me crazy...

I don't think it has anything to do with the browser. It's a server issue. I *am* retrieving the page from the server -- I can see the hit in my logs, and I can see the time it takes to generate the page. (If it's coming from a cache, why does it take so long, anyway? Seems to be the worst of both worlds! :-)

On 18 Feb 2005, at 11:41, Muzaffer Ozakca wrote:

...
I have the same problem. There should a HTTP header to invalidate caches so that the browser retrieves the page from the server. I guess, we can look at the code of recent changes and OutputPage.php to find out what needs to be sent. Then it should be OK, I believe, to do $wgOut->whatever() in the extension code to send it.

Muzo

...
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l- bounces@Wikimedia.org] On Behalf Of Jan Steinman Sent: Thursday, February 17, 2005 11:34 PM To: MediaWiki announcements and site admin list Subject: [Mediawiki-l] Caching is making me crazy...

I'm playing with extensions, but am having trouble seeing the
results.

...
...
...
...
How can I mark an article as "dirty" in an extension, so it will refresh every time?

Here's my extension, using the Unix fortune(6) command, which returns a random fortune each time, but when refreshed in a browser, I get the same one back each time. When I click "edit," I get new results, but when I refresh my browser, I do not. I even turned debugging on in

the

...
...
...
browser and made sure the browser cache was cleared. (It's taking enough time; it's obviously doing SOMETHING on the server.)

Another odd thing happens: if I use system() or passthru(), the fortune DOES refresh, but it prints at the top of the window, rather than inside the content area. When I tried shell_exec(), it puts the same old fortune in the content area each time!

extensions/Fortune.php

<?php $wgExtensionFunctions[] = 'wfFortune'; function wfFortune() { global $wgParser; $wgParser->setHook('fortune', 'renderFortune'); } function renderFortune($input) { $output = ""; $output .= shell_exec('/sw/bin/fortune'); $output .= ''; return $output; } ?>

:::: We in America today are nearer to the final triumph over poverty than ever before in the history of any land. -- Herbert Hoover, 1928 :::: Jan Steinman http://www.Bytesmiths.com

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

:::: Conflict cannot survive without your participation. -- Wayne Dyer :::: Jan Steinman http://www.Bytesmiths.com/Van

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

:::: We're in a giant car heading toward a brick wall at a hundred miles an hour, arguing over who has the best seat. -- David Suzuki :::: Jan Steinman http://www.Bytesmiths.com/Van

Brion Vibber

2:50 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

Jan Steinman wrote:

...

I'm playing with extensions, but am having trouble seeing the results.

How can I mark an article as "dirty" in an extension, so it will refresh every time?

There's not a standard way of doing this, and you really shouldn't. Pages should expect to be rendered the same time every time they're loaded unless something they explicitly pull in has changed (such as template inclusions), and extensions should not break this model.

-- brion vibber (brion @ pobox.com)

Jan Steinman

2:57 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

On 18 Feb 2005, at 13:20, Brion Vibber wrote:

...

Jan Steinman wrote:

...
How can I mark an article as "dirty" in an extension, so it will refresh every time?

There's not a standard way of doing this, and you really shouldn't. Pages should expect to be rendered the same time every time they're loaded unless something they explicitly pull in has changed (such as template inclusions), and extensions should not break this model.

I think that SEVERELY limits the usefulness of extensions, then!

In effect, it makes MediaWiki the equivalent of static HMTL pages -- except for the editing part.

I'm going to either have to do something about that, or divert stuff out to standard php pages. Ugh.

:::: Conflict cannot survive without your participation. -- Wayne Dyer :::: Jan Steinman http://www.Bytesmiths.com/Van

Brion Vibber

3:11 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

Jan Steinman wrote:

...

On 18 Feb 2005, at 13:20, Brion Vibber wrote:

...
Pages should expect to be rendered the same time every time they're loaded unless something they explicitly pull in has changed (such as template inclusions), and extensions should not break this model.

I think that SEVERELY limits the usefulness of extensions, then!

In effect, it makes MediaWiki the equivalent of static HMTL pages -- except for the editing part.

Yes, well, that's rather the whole point of the exercise.

If you want to hack it to do something fundamentally different, you're going to find yourself fighting the entire code model, including several distinct caching levels based on it.

-- brion vibber (brion @ pobox.com)

Jan Steinman

8:32 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

On 18 Feb 2005, at 13:41, Brion Vibber wrote:

...

Jan Steinman wrote:

...
On 18 Feb 2005, at 13:20, Brion Vibber wrote:

...
Pages should expect to be rendered the same time every time they're loaded...

I think that SEVERELY limits the usefulness of extensions, then!

In effect, it makes MediaWiki the equivalent of static HMTL pages -- except for the editing part.

Yes, well, that's rather the whole point of the exercise.

If you want to hack it to do something fundamentally different, you're going to find yourself fighting the entire code model, including several distinct caching levels based on it.

Okay. I guess I see your point of view.

Instead of hacking it head-on, how about sideways? How about a dedicated namespace, like "Dynamic:"? There are already special behaviors for User:, Template:, Special: (extra special) and perhaps others I haven't discovered.

There's no Intewiki site called "Dynamic:", but this might break sites that had created such a namespace. I guess it could be a configurable name, or be easily turned off completely.

Then I'm assuming it would be a case of subclassing and overriding, rather than putting in a bunch of nasty "if...else..." spaghetti.

I know how you feel about it in general. But it seems a namespace would also have less chance of breaking in future MediaWikis.

Thoughts?

:::: On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. -- Charles Babbage :::: Jan Steinman http://www.Bytesmiths.com/Item/99AT12

Doug Fields

27 Feb 27 Feb

2:29 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

Whenever I have caching problems with caching with mine (1.4beta6), I go into MySQL and

TRUNCATE TABLE objectcache;

That usually fixes the caching by removing all the caching. I don't use any special "caching" settings (like that Turk MMCache thing).

Cheers,

Doug

Philipp Albig

28 Feb 28 Feb

3:28 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

Hallo again.

Doug Fields schrieb:

...

TRUNCATE TABLE objectcache;

That works for all the cache. Maybe it's too easy - but how or where is the cache for a article cleared, for example after editing? Sorry, I didn't find sth. about that.

Thanks and greetings, Philipp

Brion Vibber

4:39 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

Philipp Albig wrote:

...

That works for all the cache. Maybe it's too easy - but how or where is the cache for a article cleared, for example after editing? Sorry, I didn't find sth. about that.

When a page is updated directly or indirectly, the timestamp in the page_touched field (cur_touched on <= 1.4) is updated to the current time. Cached are checked against this timestamp at load time and discarded if the page_touched time is newer, forcing a re-rendering.

There is also a per-user timestamp which is updated when you log in, log out, change preferences, watch/unwatch some page, etc. If the user_touched timestamp is newer than the HTTP If-Modified-Since header, the page will be re-output (but the parser cache may still be used).

Additionally, with squid proxy mode on a PURGE request is sent to the proxy when page_touched is updated. This allows the squids to avoid hitting the apache servers to do cache validity checks for the majority of anonymous page view hits.

-- brion vibber (brion @ pobox.com)

Sebastien BARRE

5:07 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

At 2/28/2005 12:09 AM, Brion Vibber wrote:

...

Philipp Albig wrote:

...
That works for all the cache. Maybe it's too easy - but how or where is the cache for a article cleared, for example after editing? Sorry, I didn't find sth. about that.

When a page is updated directly or indirectly, the timestamp in the page_touched field (cur_touched on <= 1.4) is updated to the current time. Cached are checked against this timestamp at load time and discarded if the page_touched time is newer, forcing a re-rendering.

I posted some code in another thread, but maybe it will help you too. Here is what I'm using in my extensions to prevent a page from being "cached" and make the extension "useful".

global $wgTitle; if ($wgTitle) { $ts = mktime(); $now = gmdate("YmdHis", $ts + 60); $ns = $wgTitle->getNamespace(); $ti = wfStrencode($wgTitle->getDBkey()); $sql = "UPDATE cur SET cur_touched='$now' WHERE cur_namespace=$ns AND cur_title='$ti'"; wfQuery($sql, DB_WRITE); }

Note that this code is slightly identical to Title::invalidateCache. The difference here is that if I call invalidateCache in my extension code, it sets cur_touched to 'now', *then* create the cache, so the cache is newer than cur_touched anyway. The trick here is that I set cur_touched in the future, something not too intrusive, let's say 'now' + 60 seconds, provided that I expect the cache to be created within 60 or 120 secs once my extension code has been executed (you can increase that of course). That way, cur_touched is always fresher than the cache, and the page always gets re-created. Am I missing something ?

-- Sebastien Barre

Brion Vibber

9:29 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

Sebastien BARRE wrote: [snip]

...

Note that this code is slightly identical to Title::invalidateCache.

[snip]

...

That way, cur_touched is always fresher than the cache, and the page always gets re-created. Am I missing something ?

While this might happen to work, it's not really a very good solution; the technique is very oblique (which would make it hard to maintain), and it'll produce a lot of unnecessary writes to the database.

If MediaWiki were some proprietary monster application which you couldn't change, you might have to do it like that, but it's not. Instead, you should try adding a "don't cache this page" flag to the parser, which an extension could trigger.

-- brion vibber (brion @ pobox.com)

Sebastien BARRE

3:53 p.m.

New subject: [Mediawiki-l] Caching is making me crazy...

At 2/28/2005 04:59 AM, Brion Vibber wrote:

...

Sebastien BARRE wrote: [snip]

...
Note that this code is slightly identical to Title::invalidateCache.

[snip]

...
That way, cur_touched is always fresher than the cache, and the page always gets re-created. Am I missing something ?

While this might happen to work, it's not really a very good solution; the technique is very oblique (which would make it hard to maintain), and it'll produce a lot of unnecessary writes to the database.

Oh I agree, it's definitely a hack, and is not intended for a large web site, but as much as I like MediaWiki, it was the only quick workaround I could find to go around what I consider a (small) design flaw: to my opinion, the current extension feature is rendered pretty much useless by the current caching mechanism.

...

Instead, you should try adding a "don't cache this page" flag to the parser, which an extension could trigger.

There is such thing I assume, the invalidateCache() method is close enough, am I right ? But there is no way to call it early enough. What might (also) be missing, I think, is more hooks so that extensions can have a chance to trigger code at earlier stage in the parsing process. Actually as far as I'm concerned, a safe assumption would require less work: if an extension tag is detected in a page, it should always be re-created (unless some global flags is set, if you really want to disable that on large project like Wikipedia).

-- Sebastien Barre

Rowan Collins

1 Mar 1 Mar

12:28 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

On Mon, 28 Feb 2005 11:23:55 +0100, Sebastien BARRE sebastien.barre@kitware.com wrote:

...

At 2/28/2005 04:59 AM, Brion Vibber wrote:

...
Instead, you should try adding a "don't cache this page" flag to the parser, which an extension could trigger.

[...]

...

Actually as far as I'm concerned, a safe assumption would require less work: if an extension tag is detected in a page, it should always be re-created (unless some global flags is set, if you really want to disable that on large project like Wikipedia).

The point is, this is a chicken-and-egg problem - if the page is being read from cache, there is no way of knowing whether or not it contains an extension tag, or any other feature. It is logically impossible to determine whether to parse something as part of the process of parsing it.

Hence the sensible solution seems to be to have some flag which is checked by the code which *creates* the copy in the cache, and by-passes it if set (i.e. if the extension handler is producing dynamic content). Then, there wouldn't be a cached copy available, so it would *have* to be re-created.

-- Rowan Collins BSc [IMSoP]

Sebastien BARRE

2:52 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

At 2/28/2005 07:58 PM, Rowan Collins wrote:

...

The point is, this is a chicken-and-egg problem - if the page is being read from cache, there is no way of knowing whether or not it contains an extension tag, or any other feature.

Unless I'm missing something, it does seem possible to me, I'm pretty sure it's done for other features: when you edit the page and save it, it is parsed and tables in the database are modified accordingly. For example, the links, categorylinks, imagelinks tables, etc. So at that time, you can detect an extension tag in the page, and fill an extensionlinks table for example, with ids of the pages using extensions. You don't even need to actually refer to any extension, so it would basically be one column if needed. When it's time to serve the page, check the table to see if that page uses an extension, otherwise use the cache, etc. Am I correct ?

-- Sebastien Barre

Rowan Collins

10:39 p.m.

New subject: [Mediawiki-l] Caching is making me crazy...

On Mon, 28 Feb 2005 22:22:25 +0100, Sebastien BARRE sebastien.barre@kitware.com wrote:

...

At 2/28/2005 07:58 PM, Rowan Collins wrote:

...
The point is, this is a chicken-and-egg problem - if the page is being read from cache, there is no way of knowing whether or not it contains an extension tag, or any other feature.

Unless I'm missing something, it does seem possible to me, I'm pretty sure it's done for other features: when you edit the page and save it, it is parsed and tables in the database are modified accordingly. For example, the links, categorylinks, imagelinks tables, etc. So at that time, you can detect an extension tag in the page, and fill an extensionlinks table for example, with ids of the pages using extensions. You don't even need to actually refer to any extension, so it would basically be one column if needed. When it's time to serve the page, check the table to see if that page uses an extension, otherwise use the cache, etc. Am I correct ?

Well, yes, that's more-or-less similar to what I was saying - the crucial point is that you have to decide whether it should be read from cache *next* time, not *this* time. I guess I misunderstood what you were saying in the first place, because you made it sound like the process was [detect tag] -> [recreate content], whereas it would actually be more like [detect tag] -> [flag article as dynamic] and then later [detect dynamic flag] -> [ignore cached copy]

After thinking about it for a while, I went further and, rather than having an is_dynamic flag (which would be just one field in the cur / revision / whatever table, you don't need a whole new table for it) and storing a cached copy that you know you're not going to use, I suggested just not caching it in the first place. If there isn't a cached copy there, there's no need to decide whether to recreate the page or not, because you'll have to! In other words

Current / normal: [parse] -> [store in cache] and on request [check cache] -> [load from cache] Proposal for "dynamic pages": [parse] {[detect tag] -> [disable cache]} and on request [check cache] -> [can't load from cache, so parse again]

One gotcha for anyone wanting to implement this is that if you have a template with dynamic content in it, you've got to make sure that neither the template itself *nor* pages containing that template get cached. But I think that may turn out to be easier if you're just not creating cached copies too - any activity that could possibly result in a cached copy will have to call your extension, so it will trigger the cache-bypass variable.

-- Rowan Collins BSc [IMSoP]

Sebastien BARRE

11:21 p.m.

New subject: [Mediawiki-l] Caching is making me crazy...

At 3/1/2005 06:09 PM, Rowan Collins wrote:

...

process was [detect tag] -> [recreate content], whereas it would actually be more like [detect tag] -> [flag article as dynamic] and then later [detect dynamic flag] -> [ignore cached copy]

You got it, that's what I meant.

...

After thinking about it for a while, I went further and, rather than having an is_dynamic flag (which would be just one field in the cur / revision / whatever table, you don't need a whole new table for it)

Yes, definitely, but I did not want to be flamed too early on by suggesting a new column in 'cur' :)

Now more seriously, out of curiosity, we would have talked about an extra bit or an extra-byte per page record, wouldn't we ? Are you guys concerned by that in regards to web sites like Wikipedia ? My gut feeling is that the list of pages using extensions or dynamic contents would probably be *way* smaller than the total number of pages, so maybe a whole new table would not have been a totally stupid idea, it would have saved space and computational resources, and made wikimedia easier to maintain or upgrade once we had figured out that maybe there was a better or different way to solve this caching issue.

...

In other words

Current / normal: [parse] -> [store in cache] and on request [check cache] -> [load from cache] Proposal for "dynamic pages": [parse] {[detect tag] -> [disable cache]} and on request [check cache] -> [can't load from cache, so parse again]

Sounds good.

...

One gotcha for anyone wanting to implement this is that if you have a template with dynamic content in it, you've got to make sure that neither the template itself *nor* pages containing that template get cached.

I don't follow you. Aren't templates just like regular pages ? I mean, if a template has dynamic content, it won't get cached. Therefore, when it's time to render a page that uses a template, the template will have to be re-parsed (and not cached again), therefore the page will be up-to-date, as will any other pages using that template.

-- Sebastien Barre

Jan Steinman

2 Mar 2 Mar

3:14 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

On 28 Feb 2005, at 10:58, Rowan Collins wrote:

...

The point is, this is a chicken-and-egg problem - if the page is being read from cache, there is no way of knowing whether or not it contains an extension tag, or any other feature. It is logically impossible to determine whether to parse something as part of the process of parsing it.

I suggested a specific namespace, and asked for comments, and got a big yawn.

It seems to me one could subclass Article or something so that the namespace "Dynamic:" would never be cached, for example.

I ask again: comments?

:::: If you succumb to the temptation of using violence in the struggle, unborn generations will be the recipients of a long and desolate night of bitterness, and your chief legacy to the future will be an endless reign of meaningless chaos. - Martin Luther King Jr. :::: Jan Steinman http://www.Bytesmiths.com/Item/99-0590-18

Sebastien BARRE

3:21 a.m.

New subject: [Mediawiki-l] Caching is making me crazy...

At 3/1/2005 10:44 PM, Jan Steinman wrote:

...

On 28 Feb 2005, at 10:58, Rowan Collins wrote:

...
The point is, this is a chicken-and-egg problem - if the page is being read from cache, there is no way of knowing whether or not it contains an extension tag, or any other feature. It is logically impossible to determine whether to parse something as part of the process of parsing it.

I suggested a specific namespace, and asked for comments, and got a big yawn.

It seems to me one could subclass Article or something so that the namespace "Dynamic:" would never be cached, for example.

I ask again: comments?

Well that does not look like a great idea, I mean, I need namespaces too, they are pretty useful, I can't just move everything or part of a project into a new namespace... And if the page is not dynamic anymore (for example, if you don't use the extension in this specific page), you have to move the page out of Dynamic: ? Tedious. Eventually if it was a specific reserved *category*, not a namespace, but even that... I think Rowan had a decent implementation scheme for this problem (see thread).

-- Sebastien Barre

Anders Wegge Jakobsen

4:23 a.m.

New subject: [Mediawiki-l] Re: Caching is making me crazy...

"Jan" == Jan Steinman Jan@Bytesmiths.com writes:

...

On 28 Feb 2005, at 10:58, Rowan Collins wrote:

...
The point is, this is a chicken-and-egg problem - if the page is being read from cache, there is no way of knowing whether or not it contains an extension tag, or any other feature. It is logically impossible to determine whether to parse something as part of the process of parsing it.

...

I suggested a specific namespace, and asked for comments, and got a big yawn.

...

It seems to me one could subclass Article or something so that the namespace "Dynamic:" would never be cached, for example.

...

I ask again: comments?

Pages in the Special: namespace? They are sent with the followin header bit:

Cache-Control: private, must-revalidate, max-age=0

Unless I misunderstand what you want, that should be perfectly sensible for your purposes.

-- /Wegge http://wiki.wegge.dk/Usenet mailto:awegge@gmail.com - Invitationer på FCFS basis

Rowan Collins

5:12 a.m.

New subject: [Mediawiki-l] Re: Caching is making me crazy...

On 01 Mar 2005 23:53:57 +0100, Anders Wegge Jakobsen wegge@wegge.dk wrote:

...

Pages in the Special: namespace? They are sent with the followin header bit:
Cache-Control: private, must-revalidate, max-age=0
Unless I misunderstand what you want, that should be perfectly sensible for your purposes.

No, no. You must have missed the rest of the thread - there's a lot more to it than that, because there are various levels of *internal* caching in addition to the browser caching.

-- Rowan Collins BSc [IMSoP]

Anders Wegge Jakobsen

12:19 p.m.

New subject: [Mediawiki-l] Re: Caching is making me crazy...

"Rowan" == Rowan Collins rowan.collins@gmail.com writes:

...

On 01 Mar 2005 23:53:57 +0100, Anders Wegge Jakobsen wegge@wegge.dk wrote:

...

...
Pages in the Special: namespace? They are sent with the followin header bit:

Cache-Control: private, must-revalidate, max-age=0

Unless I misunderstand what you want, that should be perfectly sensible for your purposes.

...

No, no. You must have missed the rest of the thread - there's a lot more to it than that, because there are various levels of *internal* caching in addition to the browser caching.

It was a reply to the proposal of a Dynamic namespace.

-- /Wegge http://wiki.wegge.dk/Usenet mailto:awegge@gmail.com - Invitationer på FCFS basis

Rowan Collins

9:31 p.m.

New subject: [Mediawiki-l] Re: Caching is making me crazy...

On 02 Mar 2005 07:49:12 +0100, Anders Wegge Jakobsen wegge@wegge.dk wrote:

...

...
No, no. You must have missed the rest of the thread - there's a lot more to it than that, because there are various levels of *internal* caching in addition to the browser caching.

It was a reply to the proposal of a Dynamic namespace.

OK, sorry if I seemed a bit rude. But the Special namespace *isn't* really a good example, because it doesn't really exist - Special pages have no content of their own stored in the database, but are basically self-contained PHP scripts with access to MediaWiki's functions and to the database. This is very different from having an editable page, which exists in the database, has history, etc, but which contains content which must never be cached.

And, as I say, although Special pages send cache control headers to browsers and other *external* caches, they don't ever interact with the *internal* caches, so have no need for code to bypass them. A page in the database with dynamic content, however defined, would have to have some way of disabling or by-passing the internal caching processes, which does not yet exist.

Meanwhile, if somebody *did* code a cache by-pass mechanism, it would be interesting to consider if (as some people have suggested in the past) some of the Special pages could be made to produce a {{transcludable}} version of their content. But that's just me pipe-dreaming...

-- Rowan Collins BSc [IMSoP]

NSK

18 Feb 18 Feb

1:30 a.m.

New subject: [Mediawiki-l] XHTML generation

On Thursday 17 February 2005 21:19, Brion Vibber wrote:

...

Thank you for your opinion. It will be dutifully studied, found wanting, and discarded.

But I'm sure experienced programmers will agree with me.

...

For the people in the audience who are interested in accuracy, note that XML allows attribute values to be quoted either with double quotes or single quotes. Here's the formal lexical definition[1]:

Yes, it allows that, but not both in the same document.

...

Since strings in PHP source code are themselves usually either single or double-quoted, instances of the same quote character within the string must be escaped with a backslash to appear as a literal character.

You can use heredoc syntax.

-- NSK http://portal.wikinerds.org

Anders Wegge Jakobsen

1:35 a.m.

New subject: [Mediawiki-l] Re: XHTML generation

"NSK" == NSK nsk2@wikinerds.org writes:

...

On Thursday 17 February 2005 21:19, Brion Vibber wrote:

...
Thank you for your opinion. It will be dutifully studied, found wanting, and discarded.

...

But I'm sure experienced programmers will agree with me.

Experienced? Check. Programmer? Check. Agree with? Brion.

...

...
For the people in the audience who are interested in accuracy, note that XML allows attribute values to be quoted either with double quotes or single quotes. Here's the formal lexical definition[1]:

...

Yes, it allows that, but not both in the same document.

I think you should familiarize yourself with [[Backus-Naur_form]] before further study of the snippet below. Note the | character.

AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

...

...
Since strings in PHP source code are themselves usually either single or double-quoted, instances of the same quote character within the string must be escaped with a backslash to appear as a literal character.

...

You can use heredoc syntax.

Personally I can think of several more worthwhile things to do.

-- /Wegge http://wiki.wegge.dk/ http://wiki.wegge.dk/Folk_jeg_ignorerer_paa_usenet mailto:awegge@gmail.com - Invitationer på FCFS basis

Patrick Gundlach

1:47 a.m.

New subject: [Mediawiki-l] Re: XHTML generation

Hello NSK,

...

...
Thank you for your opinion. It will be dutifully studied, found wanting, and discarded.

But I'm sure experienced programmers will agree with me.

Well, I consider myself a bit experienced, but I don't agree (yet?)

...

...
For the people in the audience who are interested in accuracy, note that XML allows attribute values to be quoted either with double quotes or single quotes. Here's the formal lexical definition[1]:

Yes, it allows that, but not both in the same document.

Could you be more elaborate on that subject? Where is it stated?

Patrick

NSK

2:54 a.m.

New subject: [Mediawiki-l] Re: XHTML generation

On Thursday 17 February 2005 22:17, Patrick Gundlach wrote:

...

Could you be more elaborate on that subject? Where is it stated?

I read that in a programmer's book, but I need to look at my bookshelf if you want specific title and author.

-- NSK http://portal.wikinerds.org

Patrick Gundlach

4:54 a.m.

New subject: [Mediawiki-l] Re: XHTML generation

Hello NSK,

...

...
Could you be more elaborate on that subject? Where is it stated?

I read that in a programmer's book, but I need to look at my bookshelf if you want specific title and author.

Oh, well... If it is not stated in the spec, the note in the programmer's book is worthless anyway. There might be xml parsers that have trouble with both ways of coding. But that is not mediawiki's problem.

Patrick

7237

Age (days ago)

7250

Last active (days ago)

mediawiki-l@lists.wikimedia.org

36 comments

12 participants

tags (0)

participants (12)

Anders Wegge Jakobsen
Brion Vibber
Doug Fields
Frank Wales
Jan Steinman
Muzaffer Ozakca
NSK
Patrick Gundlach
Paul Johnson
Philipp Albig
Rowan Collins
Sebastien BARRE