Hi, folks,
I am trying to make a wiki template to render some meta data of an article. And I want to use HTML5 Microdata to make these metadate machine-readable.
But the problem I found is that XHtml Sanitizer in MediaWiki remove all the attributes needed.
The Microdata spec[1] add below global attribute for almost every html tag: * itemscope * itemtype * itemid * itemref * itemprop
Sometimes a "meta" tag will be used to express an name-value pair which human can not see it.
So, in general, if MediaWiki can use Microdata format, we should relax the Sanitizer for the global attributes and meta tag.
For the global attributes, I think it is no harm, and quite easy to fix.
But for the meta tag, I am not very sure, I don't know whether the search engine still use it or not. If the search engines do not care about it, why not relax the constraints on it?
I know Microformat is compatible with the Sanitizer, but it use the "class" attribute so heavy. You know, class attibute may used for other purpose such as rendering. After comparing, I think microdata is more neat.
Above is my personal proposal. Thanks for your consideration.
Regards, Mingli
On 10 mrt 2011, Mingli Yuan wrote:
Hi, folks,
I am trying to make a wiki template to render some meta data of an article. And I want to use HTML5 Microdata to make these metadate machine- readable.
But the problem I found is that XHtml Sanitizer in MediaWiki remove all the attributes needed.
The Microdata spec[1]
...
Above is my personal proposal. Thanks for your consideration.
Regards, Mingli
Hi Mingli,
When running in HTML5-mode (support added in 1.16.0, enabled by default) any attribute starting with "data-" is allowed and will not be stripped.
If $wgHtml5 is set to false [1] then they will be stripped.
There haven't been any new attributes supported that don't use the data- prefix though, so things like <el itemprop=""> will not work and stripped in the sanitizer.
I dont know how final the HTML5 microdata spec is though (it seems to be marked as a draft) perhaps we should wait adding new attirbutes untill it's ready ?
A few templates and scripts have already stated using the data-*- attributes in MediaWiki.
-- Krinkle
Hi, Krinkle,
Thanks for your reply.
Agree with you that Microdata is not mature enough so far, and we should wait for it.
And I will try the custom data-* attribute, although in this way I will lose generality and other tooling support, it is still work for my project.
Thanks for you help.
Regards, Mingli
On Fri, Mar 11, 2011 at 1:16 AM, Krinkle krinklemail@gmail.com wrote:
On 10 mrt 2011, Mingli Yuan wrote:
Hi, folks,
I am trying to make a wiki template to render some meta data of an article. And I want to use HTML5 Microdata to make these metadate machine- readable.
But the problem I found is that XHtml Sanitizer in MediaWiki remove all the attributes needed.
The Microdata spec[1]
...
Above is my personal proposal. Thanks for your consideration.
Regards, Mingli
Hi Mingli,
When running in HTML5-mode (support added in 1.16.0, enabled by default) any attribute starting with "data-" is allowed and will not be stripped.
If $wgHtml5 is set to false [1] then they will be stripped.
There haven't been any new attributes supported that don't use the data- prefix though, so things like <el itemprop=""> will not work and stripped in the sanitizer.
I dont know how final the HTML5 microdata spec is though (it seems to be marked as a draft) perhaps we should wait adding new attirbutes untill it's ready ?
A few templates and scripts have already stated using the data-*- attributes in MediaWiki.
-- Krinkle
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, Mar 10, 2011 at 6:04 AM, Mingli Yuan mingli.yuan@gmail.com wrote:
I am trying to make a wiki template to render some meta data of an article. And I want to use HTML5 Microdata to make these metadate machine-readable.
But the problem I found is that XHtml Sanitizer in MediaWiki remove all the attributes needed.
Set $wgAllowMicrodataAttributes = true; in LocalSettings.php. This is false by default (along with $wgAllowRdfaAttributes) because there's a political battle between microdata and RDFa and it's much too early to see who will win, and we wound up deciding not to take a side. Personally I think microdata is superior and we should support it and not RDFa, but others disagreed. (FWIW, I think our RDFa support doesn't actually work correctly anyway, but I can't remember the details and could be wrong.)
Glancing back over the microdata code, I should point out an important caveat: we don't allow any itemtypes other than the three originally defined in the microdata specification. This seems kind of stupid, though, so on reflection, I removed it in r83689. You can copy the (quite trivial) patch to your own wiki if you want:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/83689
The Microdata spec[1] add below global attribute for almost every html tag:
- itemscope
- itemtype
- itemid
- itemref
- itemprop
Every, not almost every -- they're global attributes.
But for the meta tag, I am not very sure, I don't know whether the search engine still use it or not. If the search engines do not care about it, why not relax the constraints on it?
We could allow meta tags and just not whitelist the properties like name or http-equiv. That would probably be safe. But you should only use <meta> with microdata if you can't avoid it, anyway.
On Thu, Mar 10, 2011 at 12:16 PM, Krinkle krinklemail@gmail.com wrote:
I dont know how final the HTML5 microdata spec is though (it seems to be marked as a draft) perhaps we should wait adding new attirbutes untill it's ready ?
All of HTML5 is only a draft, by W3C standards. The WHATWG version is billed as a "Living Standard" like all the rest: http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#microdata. Microdata specifically is stable -- it hasn't changed for quite a few months, and the editor has assured me that it's very unlikely to change incompatibly at this point. The question is really whether we want to take sides in the microdata vs. RDFa battle. (As I said, I think we should, but I'm probably biased.)
Hi, Aryeh,
Sorry for late reply.
I think I can not enable the $wgAllowMicrodataAttributes setting in LocalSettings.php, because my project hosted in Chinese Wikipedia, I think I have to wait for the decision from Foundation. For now, I will custom data-* for a temporary solution.
Thanks for your help.
Regards, Mingli
On Fri, Mar 11, 2011 at 10:11 AM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
On Thu, Mar 10, 2011 at 6:04 AM, Mingli Yuan mingli.yuan@gmail.com wrote:
I am trying to make a wiki template to render some meta data of an article. And I want to use HTML5 Microdata to make these metadate machine-readable.
But the problem I found is that XHtml Sanitizer in MediaWiki remove all the attributes needed.
Set $wgAllowMicrodataAttributes = true; in LocalSettings.php. This is false by default (along with $wgAllowRdfaAttributes) because there's a political battle between microdata and RDFa and it's much too early to see who will win, and we wound up deciding not to take a side. Personally I think microdata is superior and we should support it and not RDFa, but others disagreed. (FWIW, I think our RDFa support doesn't actually work correctly anyway, but I can't remember the details and could be wrong.)
Glancing back over the microdata code, I should point out an important caveat: we don't allow any itemtypes other than the three originally defined in the microdata specification. This seems kind of stupid, though, so on reflection, I removed it in r83689. You can copy the (quite trivial) patch to your own wiki if you want:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/83689
The Microdata spec[1] add below global attribute for almost every html tag:
- itemscope
- itemtype
- itemid
- itemref
- itemprop
Every, not almost every -- they're global attributes.
But for the meta tag, I am not very sure, I don't know whether the search engine still use it or not. If the search engines do not care about it, why not relax the constraints on it?
We could allow meta tags and just not whitelist the properties like name or http-equiv. That would probably be safe. But you should only use <meta> with microdata if you can't avoid it, anyway.
On Thu, Mar 10, 2011 at 12:16 PM, Krinkle krinklemail@gmail.com wrote:
I dont know how final the HTML5 microdata spec is though (it seems to be marked as a draft) perhaps we should wait adding new attirbutes untill it's ready ?
All of HTML5 is only a draft, by W3C standards. The WHATWG version is billed as a "Living Standard" like all the rest: http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#microdata. Microdata specifically is stable -- it hasn't changed for quite a few months, and the editor has assured me that it's very unlikely to change incompatibly at this point. The question is really whether we want to take sides in the microdata vs. RDFa battle. (As I said, I think we should, but I'm probably biased.)
On Fri, Mar 11, 2011 at 5:20 AM, Mingli Yuan mingli.yuan@gmail.com wrote:
I think I can not enable the $wgAllowMicrodataAttributes setting in LocalSettings.php, because my project hosted in Chinese Wikipedia, I think I have to wait for the decision from Foundation. For now, I will custom data-* for a temporary solution.
$wgHtml5 is currently disabled on Wikimedia wikis as well, due to screen-scraping bots that break because of the doctype. I'll spend a few minutes now seeing if I can make some changes to make $wgHtml5 mode deployable again. (Third time's the charm?)
On 11 March 2011 19:55, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
$wgHtml5 is currently disabled on Wikimedia wikis as well, due to screen-scraping bots that break because of the doctype.
Weren't *all* bots told a while ago to use the API or risk random arbitrary breakage? (Or am I thinking of something else?)
- d.
On Fri, Mar 11, 2011 at 3:40 PM, David Gerard dgerard@gmail.com wrote:
Weren't *all* bots told a while ago to use the API or risk random arbitrary breakage? (Or am I thinking of something else?)
All bots have always been told that, but we're not serious enough about enforcing it to actually make all bots grind to a halt. I've made some tweaks that should allow another HTML5 deployment attempt, though, probably not breaking too many bots. (Non-browser-based bots should just be using an HTML5 parser instead of an XML parser to avoid this pain, but browser scripts don't necessarily have that option.)
wikitech-l@lists.wikimedia.org