On Tue, 14 Feb 2012 03:21:56 -0800, Gabriel Wicke <wicke(a)wikidev.net>
wrote:
On 02/13/2012 10:28 PM, Daniel Friesen wrote:
is
basically a formal way to extract the parameters
of a template without
having to do the unreliable work of attempting to parse the WikiText
themselves. So it's still a usable improvement.
The main issue I have with this style of a purely structural itemtype is
the limited pragmatic value compared to its significantly increased
cost. A relatively light-weight fragment like
<div
itemtype="http://en.wikipedia.org/wiki/Template:Foo" itemscope>
<span itemprop="firstname">The first name</span>
</div>
would be blown up to something like
<div
itemtype="http://www.mediawiki.org/microdata/wikitext/Transclusion&quo…
itemscope>
<meta itemprop="source"
data="http://en.wikipedia.org/wiki/Template:Foo" />
<span itemprop="Argument"
itemtype="http://www.mediawiki.org/microdata/wikitext/Argument"
itemscope>
<meta itemprop="argname" content="firstname">
<span itemprop="argvalue">The first name</span>
</span>
</div>
This would increase the memory used for the DOM, slow down network
transfers and processing and make it unlikely that we could leave this
information in regular rendered pages.
I don't think we can include this stuff in general page data anyways.
Adding any level of additional implicit markup to something as absolutely
basic as {{{1|}}} could completely destroy things. The css targeting
changes, js targeting changes, and if the template author happens to have
gone to the effort of nicely adding Microdata of their own, we destroy it.
# Template:Movie
<div itemscope
itemtype="http://schema.org/Movie">
'''Title:''' <span
itemprop="name">{{{title}}}</span>
</div>
# Page content
{{Movie|title=Avatar}}
# Result
<div itemscope
itemtype="http://en.wikipedia.org/wiki/Template:Movie">
<div itemscope
itemtype="http://schema.org/Movie">
'''Title:''' <span itemprop="name"><span
itemprop="title">Avatar</span></span>
</div>
</div>
The result is absolute nonsense. The and the only real action that can be
taken to retain the ability for the Visual Editor to keep the template's
editability is to hide the
schema.org metadata in another layer of
metadata describing it resulting in the metadata the author wrote becoming
useless.
Hence given that 3rd parties that are aware of templates and explicitly
want to extract data from their parameters can use an alternate method of
querying for the mixed dom, the fact that generic 3rd parties are unlikely
to want to hardcode anything to do with the unstable and nonsensical
meanings of transclusion parameters, and the fact that we can easily
destroy good valid metadata and user styles I don't think including this
extra dom in general page views is a good idea anyways.
For search
engines and other 3rd parties, I don't believe any of them
are going to want to go around to every wiki and start hardcoding into
their code things like
itemtype="http://mywiki.com/wiki/Template:Event"
and
itemtype="http://yourwiki.com/wiki/Template:OurEvent" both
describing an event they would extract. I don't think we're going to get
good metadata for general 3rd parties without actually embedding proper
formal microdata into templates themselves.
Unfortunately, they would have to do the same hardcoding with a global
Transclusion itemtype, as the only thing that allows an association of
vocabulary semantics (the template source URL in the meta element) still
contains the URL of the wiki. So the added complexity does not really
simplify the extraction of semantically defined data.
They have to do the
hardcoding either way. I'm saying that generic 3rd
parties aren't going to do any hardcoding of domain-specific-schemas at
all whatever the syntax we use, and hence generic 3rd parties are a
complete moot point for discussing whether we use template-url as itemtype
or a formally defined itemtype.
And the goal of metadata formats like Microdata is not simple extraction,
it's having formally defined metadata which can be extracted reliably with
an intuitive and consistent format. That's not what itemprop="last2" is.
If we just wanted simply extracted data, we wouldn't be using Microdata at
all, we'd just shove everything into something simple like:
<div data-wt-transclusion="/wiki/Template:Movie">
'''Title:''' <span
data-wt-param="title">Avatar</span>
</div>
To improve this, I am all in favor of adding schema
and editor-specific
information to templates. The most natural storage location for this
extra information would be directly in the documentation section of the
template it describes. This makes it easy to find and edit, and ensures
that the schema is copied along with the template. Some of this extra
information might even be usable to automatically add additional,
globally defined (
schema.org or similar) itemtypes to the rendered
output, which can make the information directly available to search
engines without any manual work on their part.
I also don't think that prefix matches on the itemtype instead of a full
string match are quite as hard or hacky as you make it out to be. Search
engines already routinely perform this in their crawlers to support
schema extensions:
http://schema.org/docs/extension.html.
Those are completely
different levels of wildcarding.
With
schema.org they're simply saying that every
http://schema.org/Person/Subtype matched by
http://schema.org/Person/* is
treated as a
http://schema.org/Person type. And itemprop="email/work" is
treated as itemprop="email" is. There's still a perfectly good formal
schema there.
What we're saying with itemtype="{templateurl}" is that every
itemtype="http://en.wikipedia.org/wiki/*" is a itemtype="" of, well we
don't even have a formal definition of what it is. We're just saying that
if it matches that wildcard it's a template transclusion. And there's
nothing to define what that is. We're also saying that every itemprop="*"
inside of it is a template parameter. Absolutely no formal definition
saying what kind of data goes there, how it should be treated etc. And
we're also saying that you'll get things like itemprop="first"
itemprop="last" itemprop="first2" itemprop="last2". And
you're supposed to
take "first" and "last" and combine them conceptually as one
"name", and
likewise you also have to explicitly take "first2" and "last2" and
combine
these conceptually, but they aren't of type "name2", they are also of type
"name".
This is not Microdata, this is a mess. The only relation it has to
Microdata is the fact that Microdata's syntax is being abused as a
container for it.
It's like encoding a video with H.264, the audio with AAC, putting it into
a Matroska container, changing the file extension to .webm. And then
saying it's .webm because the file extension says .webm and the container
format looks like .webm's container format.
A global itemtype hierarchy for templates could still
be introduced
along with a central repository of generally useful and semantically
annotated templates. Something like
http://mediawiki.org/md/Transclusion/Cite maybe, with the option to
subclass as
http://mediawiki.org/md/Transclusion/Cite/en.wikipedia.org
if a local extension is needed.
For the editor project, we mainly need an efficient representation of
the needed information with minimal changes to the rendered output. Any
solution that requires us to add many additional elements will simply
not work for us. The exact itemtype URL used on the other hand is easily
adjusted if a useful global hierarchy emerges.
Changing:
Foo
To:
<span itemprop="foo">Foo</span>
Is already an absolutely destroying change for anything that it would
effect.
Using:
<span itemprop="Argument"
itemtype="http://www.mediawiki.org/microdata/wikitext/Argument" itemscope>
<meta itemprop="name" content="foo">
<span itemprop="value">Foo</span>
</span>
Will not destroy things in a way any worse than the other change will.
And it's the only way you'll be able to convey something like:
<div
itemtype="http://www.mediawiki.org/microdata/wikitext/Transclusion&quo…
itemscope>
<meta itemprop="PageName" content="Template:Foo">
<meta itemprop="RawText" content="Foo#This is some discarded
data">
<b>Bar:</b> <span itemprop="argument"
itemtype="http://www.mediawiki.org/microdata/wikitext/Argument"
itemscope><!--
--><meta itemprop="name" content="bar"><!--
--><meta itemprop="default" content="Baz"><!--
--><span itemprop="value">Foo</span><!--
--></span>
</div>
Which will allow the Visual Editor to restore the original WikiText and
also have intuitive ques in the editor that will make it visually restore
the default of "Baz" to the param text "Foo" when the user does
something
to indicate to the Visual Editor that it the user would probably want the
Visual Editor to drop the param and show the default if they had actually
known about things at the source level.
And like I said before there are probably more things that would require
extra metadata beyond what itemtype and itemprop hacks can provide which I
can't even think up right now.
Gabriel
One thing I still don't get. In WikiText a <h2>Foo</h2> (normal extra
markup omitted) can be expressed by both == Foo == and ==Foo==.
I thought one of the key goals of the Visual Editor was that the Visual
Editor would not get in the way of source level editors by mucking up
content changing a ==Foo== to a == Foo == or a == Foo == to a ==Foo== when
the Visual Editor user hasn't even touched that section, like just about
every previous WYSIWYG editor has done. How is the Visual Editor supposed
to do that when the dom we're talking about is lossy and doesn't contain
any extra metadata giving that information.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://daniel.friesen.name]