Re: [Wikitext-l] Mapping WikiText to HTML5 DOM with Microdata

13 Feb 2012


      On Mon, 13 Feb 2012 00:13:21 -0800, Gabriel Wicke wicke@wikidev.net  
wrote:
...
On 02/13/2012 03:27 AM, Daniel Friesen wrote:
...
...
Microdata items can be nested, so I don't see a problem with users or
templates providing a mapping to more specific schemas like those of
schema.org. Clashes of user-provided itemtypes with those used for
editing purposes need to be prevented in the parser, but that is  
doable.
Consumers are free to ignore itemtypes they don't know about, which is
what Google etc are doing afaik- and what also motivated them to set up
schema.org in the first place.
...
Hmmm... wait now I'm confused, are we talking about a Microdata DOM
output that the Parser generates from WikiText. Or a completely tailored
one where the template itself is authored in Microdata so that it can
describe how a Visual Editor should edit it?
I considered the case where users manually add a microdata item in a
template or page. The itemtype in that case can be anything, but would
most likely be a standard type.
...
Then I'm saying that I don't like
itemtype being abused to be the template name and itemname being abused
to be the template argument name and instead of the template name and
parameter names being abused as the schema of the template having a more
verbose proper set of Microdata to describe it:
Could you elaborate why you consider one use of itemtype an abuse, while
the other would be fine?
An itemtype is supposed to be a proper type of what the data is. Something  
expected, well-known, predefined. If possible there is should be only one  
for some type of thing. And one should be able to query for it already  
knowing what that type is, like one would with an xmlns.
itemtype="http://en.wikipedia.org/wiki/Template:Cite" is not something  
pre-defined. It practically appears dynamically out of no-where with no  
forethought. And if someone copies the template then that exact same set  
of data has a completely different itemtype despite being the same thing.
Another point in this example. Template:Cite is actually a good example  
here.
In a normal itemtype you generally stick to one name for something. You  
have a citation type, and you have a "firstname" prop. And you can have  
multiples of them. ie: <span itemprop="firstname">Arnold</span> <span  
itemprop="firstname">Harold</span> (though in a real good type you'd  
likely have a separate itemtype to group all the info of a name into one  
itemprop="name" itemscope ...).
However in a template we get this:
|first=Arnold
|first2=Harold
Resulting in what you'd say would be:
<span itemprop="first">Arnold</span>
<span itemprop="first2">Harold</span>
That's nothing close to a properly defined itemtype that actually allows  
3rd parties to extract data in any sane way. Nor is it something a Visual  
Editor would make use of without a wildcard hack where it examines every  
itemtype and decides that any url pointing back to the wiki is something  
it can edit. Anything that actually manages to extract data from that kind  
of thing is a hack at it's very core.
While when we use  
`itemtype="http://www.mediawiki.org/microdata/wikitext/Transclusion%22%60 and  
`itemprop="Argument" itemscope  
itemtype="http://www.mediawiki.org/microdata/wikitext/Argument%22%60 we have a  
predefined type. We're formally describing a transclusion of a template  
into another page, and the arguments used. The format of this is defined  
beforehand. We can add in extra data that would have been a hack before.  
Like the canonical pagename of the template. Perhaps even some metadata  
that is stored inside the template itself. For example say SemanticForms  
implemented some embedded editor form code. A template could add extra  
metadata saying that the template's content should be edited using a  
defined Semantic Forms. The Visual Editor would then use that information  
to embed a small area that allows Semantic Forms to be used to edit the  
template inline. Allowing editing of things that could potentially be to  
complex for the Visual Editor to understand how to make editable. Though  
that's really just an example off the top of my head, there are probably  
other things that could use metadata from the template to improve the  
Visual Editor's ability to make templates editable as intuitively as  
possible.
...
...
I'm not quite sure if we're trying to describe templates in a way that
the VisualEditor can extract the parameters from, edit them inline (if
possible), or describe the output of a template in a way that can be
read by machines for some separate purpose.
We are trying to address all three with the same mechanism. In
particular, we are trying to aid the discover of semantics associated
with (many) template parameters for the benefit of search engines or
projects like DBPedia and WikiData.
Gabriel
For those projects like DBPedia which already hack around trying to  
extract data from the parameters passed to a template using tricks to  
associate some sort of meaning to template parameters without getting that  
information from the wiki itself using a  
itemtype="http://www.mediawiki.org/microdata/wikitext/Transclusion" is  
basically a formal way to extract the parameters of a template without  
having to do the unreliable work of attempting to parse the WikiText  
themselves. So it's still a usable improvement.
For search engines and other 3rd parties, I don't believe any of them are  
going to want to go around to every wiki and start hardcoding into their  
code things like itemtype="http://mywiki.com/wiki/Template:Event" and  
itemtype="http://yourwiki.com/wiki/Template:OurEvent" both describing an  
event they would extract. I don't think we're going to get good metadata  
for general 3rd parties without actually embedding proper formal microdata  
into templates themselves.
-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] Mapping WikiText to HTML5 DOM with Microdata