Hi,
I try to write an extension for Mediawiki that allows users to influence the keywords that are added as meta tags to the HTML code:
<meta name="keywords" content="Find out which table relates to a SYS LOB Segment" />
I already figured out that these tags are written OutputPage.php and are generated by the addKeywords function. I have found the 'OutputPageParserOutput' hook, that allows me to add more keywords (unfortunately it does not allow any access to the existing keywords to remove them). What I'm not sure if how to get the keywords the user entered into the function. What I thought of was a special tag that users can use to specific keywords that override the generated keywords...
Any ideas or hints where I should start looking? Roman
What you want to do is extending the Parser. You should take a look there.
Bryan
On 2/25/07, roman.spitzbart@liwest.at roman.spitzbart@liwest.at wrote:
Hi,
I try to write an extension for Mediawiki that allows users to influence the keywords that are added as meta tags to the HTML code:
<meta name="keywords" content="Find out which table relates to a SYS LOB Segment" />
I already figured out that these tags are written OutputPage.php and are generated by the addKeywords function. I have found the 'OutputPageParserOutput' hook, that allows me to add more keywords (unfortunately it does not allow any access to the existing keywords to remove them). What I'm not sure if how to get the keywords the user entered into the function. What I thought of was a special tag that users can use to specific keywords that override the generated keywords...
Any ideas or hints where I should start looking? Roman
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thanks for the hint about extending the Parser. What I don't understand yet how I can replace the existing parser without touching the files that come with MediaWiki. I try to get this in as extension without modifying files...
Roman
What you want to do is extending the Parser. You should take a look there.
Bryan
On 2/25/07, roman.spitzbart@... <roman.spitzbart@...> wrote:
Hi,
I try to write an extension for Mediawiki that allows users to influence >the keywords that are added as meta
tags to the HTML code:
<meta name="keywords" content="Find out which table relates to a SYS LOB >Segment" />
I already figured out that these tags are written OutputPage.php and are >generated by the addKeywords
function. I have found the 'OutputPageParserOutput' hook, that allows me to >add more keywords (unfortunately it does not allow any access to the existing keywords to >remove them). What I'm not sure if how to get the keywords the user entered into the function. What I thought >of was a special tag that users can use to specific keywords that override the generated keywords...
Any ideas or hints where I should start looking? Roman
Wikitech-l mailing list Wikitech-l@... http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Roman wrote:
Thanks for the hint about extending the Parser. What I don't understand yet how I can replace the existing parser without touching the files that come with MediaWiki. I try to get this in as extension without modifying files...
Roman
AFAIK this is not (yet) possible.
Roman,
I had a similar problem when I was trying to figure out how to add <link> tags to for RSS and Atom feed based on the presence of custom tags. Here's the source in case you're the kind of person who wants to dive right in:
http://jimbojw.com/wiki/index.php?title=WikiArticleFeeds
Read on for my take on what you want to do and how you might go about being successful at it...
What you want to do /is possible/ using existing hooks, but it requires multiple hooking methods working in concert. This is because anything that happens as the result of a tag extension executing (other than providing output) expires after the page is rendered.
Since all Parser output is cached, when the Parser runs its parse() method, this typically only happens once per edit. In other words, this means that your tag extension will not be re-run for every page view.
The meta tags exist outside the Parser - pulled into the Skin by the OutputPage as you noted. As it turns out, there is a hook in OutputPage that you can use to affect the Meta tags just before they are rendered out to the browser: 'OutputPageBeforeHTML'.
This hook runs on every page view, except when Squid caching is used. In your case this probably wouldn't make a difference anyway though since the meta tags will be the same as long as the page content hasn't changed - which would purge the squid cache anyway.
The difficulty lies in passing information from the custom tag (which the Parser will only run once per edit) into the meta tags (which must be done on each page view to be effective). This can be achieved by hiding data from the Parser. To do this, you can have your tag extension output an HTML comment with the data tucked away inside.
Here's an outline of what you could do:
1) Hook into the parser creating your extension tag
2) Have your tag interpret the internals of the tag, or attributes to the tag, as a meta keyword (or list of keywords) to be injected into the page header later.
3) Encode the incoming data as base64 to make it look like regular text. Then wrap this in an HTML comment with flags to indicate the beginning and end. Something like this: $encodedComment = '<!-- @METAKEYWORDS@'.base64_encode($input).'@METAKEYWORDS@ -->';
4) Output the wrapped, encoded comment tag. The Parser will faithfully push this into the page contents.
5) Hook 'OutputPageBeforeHTML', and in the implementing function, use a regular expression to search for your encoded comment tags in the provided $text param.
6a) If you find any matches, and want to delete all existing keywords (meaning you only want those that the user specifies), then you can call something like this:
$out->mKeywords = array();
Note: I'm not completely sure this will work though since I don't know with certainty where mKeywords gets populated in the first place. If it happens after the call to addParserOutput( ), then you may be out of luck.
6b) For each encoded comment found, add it to the OutputPage's keywords like this:
$out->addKeyword(base64_decode($encodedText));
That's about it. Good luck!
-- Jim
On 2/27/07, christoph.huesler@css.ch christoph.huesler@css.ch wrote:
Roman wrote:
Thanks for the hint about extending the Parser. What I don't understand yet how I can replace the existing parser without touching the files that come with MediaWiki. I try to get this in as extension without modifying files...
Roman
AFAIK this is not (yet) possible.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Actually, in retrospect there's a simpler solution:
1) Have users enter keyword data in <span> tags with class="keyword" like this:
<span class="keyword">Cool Stuff</span>
2) Hook OutputPageBeforeHTML (as before) to use a regular expression to parse out these strings and push the contents into $out via addKeyword().
3) Add a CSS rule to hide spans of class keyword (to MediaWiki:Common.css):
span.keyword { display: none; }
Much simpler, and doesn't require a parser hook or any base64 encoding.
On 2/27/07, Jim Wilson wilson.jim.r@gmail.com wrote:
Roman,
I had a similar problem when I was trying to figure out how to add <link> tags to for RSS and Atom feed based on the presence of custom tags. Here's the source in case you're the kind of person who wants to dive right in:
http://jimbojw.com/wiki/index.php?title=WikiArticleFeeds
Read on for my take on what you want to do and how you might go about being successful at it...
What you want to do /is possible/ using existing hooks, but it requires multiple hooking methods working in concert. This is because anything that happens as the result of a tag extension executing (other than providing output) expires after the page is rendered.
Since all Parser output is cached, when the Parser runs its parse() method, this typically only happens once per edit. In other words, this means that your tag extension will not be re-run for every page view.
The meta tags exist outside the Parser - pulled into the Skin by the OutputPage as you noted. As it turns out, there is a hook in OutputPage that you can use to affect the Meta tags just before they are rendered out to the browser: 'OutputPageBeforeHTML'.
This hook runs on every page view, except when Squid caching is used. In your case this probably wouldn't make a difference anyway though since the meta tags will be the same as long as the page content hasn't changed - which would purge the squid cache anyway.
The difficulty lies in passing information from the custom tag (which the Parser will only run once per edit) into the meta tags (which must be done on each page view to be effective). This can be achieved by hiding data from the Parser. To do this, you can have your tag extension output an HTML comment with the data tucked away inside.
Here's an outline of what you could do:
Hook into the parser creating your extension tag
Have your tag interpret the internals of the tag, or attributes to the
tag, as a meta keyword (or list of keywords) to be injected into the page header later.
- Encode the incoming data as base64 to make it look like regular text.
Then wrap this in an HTML comment with flags to indicate the beginning and end. Something like this: $encodedComment = '<!-- @METAKEYWORDS@'.base64_encode($input).'@METAKEYWORDS@ -->';
- Output the wrapped, encoded comment tag. The Parser will faithfully
push this into the page contents.
- Hook 'OutputPageBeforeHTML', and in the implementing function, use a
regular expression to search for your encoded comment tags in the provided $text param.
6a) If you find any matches, and want to delete all existing keywords (meaning you only want those that the user specifies), then you can call something like this:
$out->mKeywords = array();
Note: I'm not completely sure this will work though since I don't know with certainty where mKeywords gets populated in the first place. If it happens after the call to addParserOutput( ), then you may be out of luck.
6b) For each encoded comment found, add it to the OutputPage's keywords like this:
$out->addKeyword(base64_decode($encodedText));
That's about it. Good luck!
-- Jim
On 2/27/07, christoph.huesler@css.ch christoph.huesler@css.ch wrote:
Roman wrote:
Thanks for the hint about extending the Parser. What I don't understand yet how I can replace the existing parser without touching the files that come with MediaWiki. I try to get this in as extension without modifying files...
Roman
AFAIK this is not (yet) possible.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2/27/07, Jim Wilson wilson.jim.r@gmail.com wrote:
Add a CSS rule to hide spans of class keyword (to MediaWiki:Common.css):
span.keyword { display: none; }
Using display: none to enforce semantics is bad. It will then show up for many cell phone users, old screen readers, Lynx, etc. The comment was probably a better idea, although unfortunately still rather hacky.
Using display: none to enforce semantics is bad. It will then show up for many cell phone users, old screen readers, Lynx, etc.
I concur. Even modern feed aggregators will ignore CSS (even inline style="" directives).
The comment was probably a better idea, although unfortunately still
rather hacky.
Yeah it's hacky - I can't argue with that - but it's the only way I know of to deal with all three of the following:
1) HTML comments found in wikitext are stripped (we get around this by using an extension tag) 2) Whitespaces in extension output are converted to <br> and <p> tags (we get around this by putting it in an HTML comment) 3) Malicious users could prematurely end the comment by putting "-->" in the keyword text followed by <script> or any other HTML markup (we avoid this by base64 encoding all input and only decoding it during the meta parsing step).
If there's a better way to achieve this, I'm open to suggestions. I've been using this technique on extensions I've been developing since I haven't yet found a better way. But seriously, if there /is/ a cleaner way to do this I'd love to know about it :)
On 2/27/07, Simetrical Simetrical+wikilist@gmail.com wrote:
On 2/27/07, Jim Wilson wilson.jim.r@gmail.com wrote:
- Add a CSS rule to hide spans of class keyword (to MediaWiki:
Common.css):
span.keyword { display: none; }
Using display: none to enforce semantics is bad. It will then show up for many cell phone users, old screen readers, Lynx, etc. The comment was probably a better idea, although unfortunately still rather hacky.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2/27/07, Jim Wilson wilson.jim.r@gmail.com wrote:
Yeah it's hacky - I can't argue with that - but it's the only way I know of to deal with all three of the following:
- HTML comments found in wikitext are stripped (we get around this by using
an extension tag) 2) Whitespaces in extension output are converted to <br> and <p> tags (we get around this by putting it in an HTML comment) 3) Malicious users could prematurely end the comment by putting "-->" in the keyword text followed by <script> or any other HTML markup (we avoid this by base64 encoding all input and only decoding it during the meta parsing step).
If there's a better way to achieve this, I'm open to suggestions. I've been using this technique on extensions I've been developing since I haven't yet found a better way. But seriously, if there /is/ a cleaner way to do this I'd love to know about it :)
Probably the "cleanest" way to do it would be to create a new database table for it, akin to categorylinks or other metadata tables, and query that on page render. Whether in practice that's better than the hacky solution, I don't know.
Probably the "cleanest" way to do it would be to create a new database table for it, akin to categorylinks or other metadata tables, and query that on page render. Whether in practice that's better than the hacky solution, I don't know.
Would it be unethical to utilize the objectcache table for this purpose? In other words, push in a new entry at page-render time, then look it up at page-display time. (If none is there at page-display time, then the script assumes there's no meta data to use).
Just a thought.
On 2/27/07, Simetrical Simetrical+wikilist@gmail.com wrote:
On 2/27/07, Jim Wilson wilson.jim.r@gmail.com wrote:
Yeah it's hacky - I can't argue with that - but it's the only way I know
of
to deal with all three of the following:
- HTML comments found in wikitext are stripped (we get around this by
using
an extension tag) 2) Whitespaces in extension output are converted to <br> and <p> tags
(we
get around this by putting it in an HTML comment) 3) Malicious users could prematurely end the comment by putting "-->" in
the
keyword text followed by <script> or any other HTML markup (we avoid
this by
base64 encoding all input and only decoding it during the meta parsing step).
If there's a better way to achieve this, I'm open to suggestions. I've
been
using this technique on extensions I've been developing since I haven't
yet
found a better way. But seriously, if there /is/ a cleaner way to do
this
I'd love to know about it :)
Probably the "cleanest" way to do it would be to create a new database table for it, akin to categorylinks or other metadata tables, and query that on page render. Whether in practice that's better than the hacky solution, I don't know.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 27/02/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Probably the "cleanest" way to do it would be to create a new database table for it, akin to categorylinks or other metadata tables, and query that on page render. Whether in practice that's better than the hacky solution, I don't know.
I would have said the cleanest way would be to alter some property of the ParserOutput object. An excellent idea would be to introduce an array of arbitrary properties to this, which a parser hook extension can set or alter, and then allow access to it using an appropriate hook which is called when all the information is pulled out of the ParserOutput object in order to build up the page again.
I think I've proposed this before.
Rob Church
wikitech-l@lists.wikimedia.org