On 02/13/2012 10:28 PM, Daniel Friesen wrote:
itemtype="http://www.mediawiki.org/microdata/wikitext/Transclusion" is basically a formal way to extract the parameters of a template without having to do the unreliable work of attempting to parse the WikiText themselves. So it's still a usable improvement.
The main issue I have with this style of a purely structural itemtype is the limited pragmatic value compared to its significantly increased cost. A relatively light-weight fragment like
<div itemtype="http://en.wikipedia.org/wiki/Template:Foo" itemscope> <span itemprop="firstname">The first name</span> </div>
would be blown up to something like
<div itemtype="http://www.mediawiki.org/microdata/wikitext/Transclusion" itemscope> <meta itemprop="source" data="http://en.wikipedia.org/wiki/Template:Foo" /> <span itemprop="Argument" itemtype="http://www.mediawiki.org/microdata/wikitext/Argument" itemscope> <meta itemprop="argname" content="firstname"> <span itemprop="argvalue">The first name</span> </span> </div>
This would increase the memory used for the DOM, slow down network transfers and processing and make it unlikely that we could leave this information in regular rendered pages.
For search engines and other 3rd parties, I don't believe any of them are going to want to go around to every wiki and start hardcoding into their code things like itemtype="http://mywiki.com/wiki/Template:Event" and itemtype="http://yourwiki.com/wiki/Template:OurEvent" both describing an event they would extract. I don't think we're going to get good metadata for general 3rd parties without actually embedding proper formal microdata into templates themselves.
Unfortunately, they would have to do the same hardcoding with a global Transclusion itemtype, as the only thing that allows an association of vocabulary semantics (the template source URL in the meta element) still contains the URL of the wiki. So the added complexity does not really simplify the extraction of semantically defined data.
To improve this, I am all in favor of adding schema and editor-specific information to templates. The most natural storage location for this extra information would be directly in the documentation section of the template it describes. This makes it easy to find and edit, and ensures that the schema is copied along with the template. Some of this extra information might even be usable to automatically add additional, globally defined (schema.org or similar) itemtypes to the rendered output, which can make the information directly available to search engines without any manual work on their part.
I also don't think that prefix matches on the itemtype instead of a full string match are quite as hard or hacky as you make it out to be. Search engines already routinely perform this in their crawlers to support schema extensions: http://schema.org/docs/extension.html.
A global itemtype hierarchy for templates could still be introduced along with a central repository of generally useful and semantically annotated templates. Something like http://mediawiki.org/md/Transclusion/Cite maybe, with the option to subclass as http://mediawiki.org/md/Transclusion/Cite/en.wikipedia.org if a local extension is needed.
For the editor project, we mainly need an efficient representation of the needed information with minimal changes to the rendered output. Any solution that requires us to add many additional elements will simply not work for us. The exact itemtype URL used on the other hand is easily adjusted if a useful global hierarchy emerges.
Gabriel