Re: [Wikitext-l] Mapping WikiText to HTML5 DOM with Microdata

13 Feb 2012

On Mon, 13 Feb 2012 00:13:21 -0800, Gabriel Wicke &lt;wicke(a)wikidev.net&gt;  
wrote:

...
  On 02/13/2012 03:27 AM, Daniel Friesen wrote:

 > Microdata items can be nested, so I don't
see a problem with users or
> templates providing a mapping to more specific schemas like those of
> schema.org. Clashes of user-provided itemtypes with those used for
> editing purposes need to be prevented in the parser, but that is  
> doable.
> Consumers are free to ignore itemtypes they don't know about, which is
> what Google etc are doing afaik- and what also motivated them to set up
> schema.org in the first place. 
  Hmmm... wait now I'm confused, are we talking
about a Microdata DOM
 output that the Parser generates from WikiText. Or a completely tailored
 one where the template itself is authored in Microdata so that it can
 describe how a Visual Editor should edit it? 
 I considered the case where users manually add a microdata item in a
 template or page. The itemtype in that case can be anything, but would
 most likely be a standard type.

  Then I'm saying that I don't like
 itemtype being abused to be the template name and itemname being abused
 to be the template argument name and instead of the template name and
 parameter names being abused as the schema of the template having a more
 verbose proper set of Microdata to describe it: 
 Could you elaborate why you consider one use of itemtype an abuse, while
 the other would be fine? 
An itemtype is supposed to be a proper type of what the data is. Something  
expected, well-known, predefined. If possible there is should be only one  
for some type of thing. And one should be able to query for it already  
knowing what that type is, like one would with an xmlns.

itemtype="http://en.wikipedia.org/wiki/Template:Cite" is not something  
pre-defined. It practically appears dynamically out of no-where with no  
forethought. And if someone copies the template then that exact same set  
of data has a completely different itemtype despite being the same thing.

Another point in this example. Template:Cite is actually a good example  
here.

In a normal itemtype you generally stick to one name for something. You 
have a citation type, and you have a "firstname" prop. And you can have 
multiples of them. ie: Arnold
Harold (though in a real good type
you'd 
likely have a separate itemtype to group all the info of a name into one 
itemprop="name" itemscope ...).
However in a template we get this:
|first=Arnold
|first2=Harold
Resulting in what you'd say would be:
Arnold
Harold

That's nothing close to a properly defined itemtype that actually allows  
3rd parties to extract data in any sane way. Nor is it something a Visual  
Editor would make use of without a wildcard hack where it examines every  
itemtype and decides that any url pointing back to the wiki is something  
it can edit. Anything that actually manages to extract data from that kind  
of thing is a hack at it's very core.

While when we use  
`itemtype="http://www.mediawiki.org/microdata/wikitext/Transclusion"` and  
`itemprop="Argument" itemscope  
itemtype="http://www.mediawiki.org/microdata/wikitext/Argument"` we have a  
predefined type. We're formally describing a transclusion of a template  
into another page, and the arguments used. The format of this is defined  
beforehand. We can add in extra data that would have been a hack before.  
Like the canonical pagename of the template. Perhaps even some metadata  
that is stored inside the template itself. For example say SemanticForms  
implemented some embedded editor form code. A template could add extra  
metadata saying that the template's content should be edited using a  
defined Semantic Forms. The Visual Editor would then use that information  
to embed a small area that allows Semantic Forms to be used to edit the  
template inline. Allowing editing of things that could potentially be to  
complex for the Visual Editor to understand how to make editable. Though  
that's really just an example off the top of my head, there are probably  
other things that could use metadata from the template to improve the  
Visual Editor's ability to make templates editable as intuitively as  
possible.

...
   I'm not
quite sure if we're trying to describe templates in a way that
 the VisualEditor can extract the parameters from, edit them inline (if
 possible), or describe the output of a template in a way that can be
 read by machines for some separate purpose. 
 We are trying to address all three with the same mechanism. In
 particular, we are trying to aid the discover of semantics associated
 with (many) template parameters for the benefit of search engines or
 projects like DBPedia and WikiData.

 Gabriel 
For those projects like DBPedia which already hack around trying to  
extract data from the parameters passed to a template using tricks to  
associate some sort of meaning to template parameters without getting that  
information from the wiki itself using a  
itemtype="http://www.mediawiki.org/microdata/wikitext/Transclusion&quo… is  
basically a formal way to extract the parameters of a template without  
having to do the unreliable work of attempting to parse the WikiText  
themselves. So it's still a usable improvement.
For search engines and other 3rd parties, I don't believe any of them are  
going to want to go around to every wiki and start hardcoding into their  
code things like itemtype="http://mywiki.com/wiki/Template:Event" and  
itemtype="http://yourwiki.com/wiki/Template:OurEvent" both describing an  
event they would extract. I don't think we're going to get good metadata  
for general 3rd parties without actually embedding proper formal microdata  
into templates themselves.

-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] Mapping WikiText to HTML5 DOM with Microdata