Hello all
I have created a first preliminary draft of how data items from the Wikidata repository may be accessed and rendered on the client wiki, e.g. to make infoboxes.
https://meta.wikimedia.org/wiki/Wikidata/Notes/Inclusion_syntax
It would be great if you could have a look and let us know about any unclarities, omissions, or other flaws - and of course about your ideas of how to do this.
Getting this right is an important part of implementing phase 2 of the Wikidata project, and so I feel it's important to start drafting and discussing early. Having a powerful but not overly complex way to create infoboxes etc from Wikidata items is very important for the acceptance of Wikidata on the clinet wikis, I believe.
Thanks, Daniel
Hi Daniel, everyone,
2012/5/22 Daniel Kinzler daniel.kinzler@wikimedia.de:
I have created a first preliminary draft of how data items from the Wikidata repository may be accessed and rendered on the client wiki, e.g. to make infoboxes.
https://meta.wikimedia.org/wiki/Wikidata/Notes/Inclusion_syntax
Great!
It would be great if you could have a look and let us know about any unclarities, omissions, or other flaws - and of course about your ideas of how to do this.
Getting this right is an important part of implementing phase 2 of the Wikidata project, and so I feel it's important to start drafting and discussing early. Having a powerful but not overly complex way to create infoboxes etc from Wikidata items is very important for the acceptance of Wikidata on the clinet wikis, I believe.
I've some questions about the design proposed in this draft. I'm not sure these are actual issues, but I prefer to be sure ;-)
i. It seems to me that the proposed design implies that any access to data is done through a template transclusion, which could be fine for the given example (ie. infoboxes — though I'll raise another issue below, see ii.) but AFAIU forbids direct use of data in an article. Is this a desirable limitation? Or did I miss something?
ii. I understand that only one item can be included at once (either by using the data_item attribute or the article links). What if for some reason we want a template that accesses several items? Is it reasonable to assume that there will always be an item that links to every needed item for a given template?
iii. Articles on current wikis use templates named (eg.) “{{Infobox…”. Shouldn't it be a prerequisite for templates using Wikidata to (be able to) keep the same name, so that: - we don't have to run bots on every single article only to prepend “#data-template:” but just have to update the template? Arguably, we would have to edit the articles anyway to remove attributes that are handled by Wikidata; just to be sure. - people don't massively reject the transition to Wikidata because of the /visible/ syntax change.
iv. Per i., ii. and iii., wouldn't it be desirable to have some syntax to access an item without relying on template transclusion? This would enable us to: - use data in an article without having to write a template first (solves i.); - write templates that can get as many items as they need, either from the transcluding page or /by themselves/ (solves ii.); - update the existing templates to Wikidata without having to edit the articles and without visible changes from the template user's perspective — except for what is handled by Wikidata, of course (solves iii.).
v. What about lua? :)
Best regards,
On 22.05.2012 15:08, Jérémie Roquet wrote:
I've some questions about the design proposed in this draft. I'm not sure these are actual issues, but I prefer to be sure ;-)
i. It seems to me that the proposed design implies that any access to data is done through a template transclusion, which could be fine for the given example (ie. infoboxes — though I'll raise another issue below, see ii.) but AFAIU forbids direct use of data in an article. Is this a desirable limitation? Or did I miss something?
I considered this a small price to pay (it's an uncommon use case, as the experience with SMW shows - people use templates for this sort of thing).
But it's easy enough to overcome this limitation, see me reply to Nikola.
ii. I understand that only one item can be included at once (either by using the data_item attribute or the article links).
Only one item can be formated by a given template call. Multiple items can be included on the same page, using the same or different templates for formatting.
What if for some reason we want a template that accesses several items? Is it reasonable to assume that there will always be an item that links to every needed item for a given template?
In the Wikipedia use case which is the basis for our spec, yes. Because the items are per definition describing the same thing that the Wikipedia page describes.
Even for other use cases, I can hardly imagine when multiple items should be passed to a single template. Templates are made to provide uniform formatting for objects with similar properties. They are essentially views on specific types of items.
But again, this limitation is easy enough to overcome, see my reply to Nikola.
iii. Articles on current wikis use templates named (eg.) “{{Infobox…”. Shouldn't it be a prerequisite for templates using Wikidata to (be able to) keep the same name, so that:
The templates can have any name, wikidata/wikibase doesn't care. "infobox" was just an example. There can be any number of such templates on the client wikis, for formating different things in different ways.
- we don't have to run bots on every single article only to prepend
“#data-template:” but just have to update the template? Arguably, we would have to edit the articles anyway to remove attributes that are handled by Wikidata; just to be sure.
Editing of every article *will* be necessary. I see no way past this. Whether that edit introduces a simpler template rederence, or a parser function call, is yet to be decided.
- people don't massively reject the transition to Wikidata because of
the /visible/ syntax change.
Well, there will be massive changes to the article, to wit, removal of all the infobox parameters.
Do you think putting a parser function call directly into the article would be considered far worse than a simple template call? I.e. is
{{Infobox}}
really so much better than
{{#data-template:Infobox}}
?
I would actually prefer the latter, because it is more obvious, less magic happening invisibly in the background.
But if need be, it's easy enough to hide that call: if the Infobox template is smart, and when called without any parameters, calls {{#data-template:Infobox}}. So in the article, you'd just see {{Infobox}}.
The same can of course also be done by using a "top level" template for hiding the parser function (e.g. Infobox) separate from the real template (e.g. Infobox-format or whatever), which would be used with {{#data-template:Infobox-format}} by the Infobox template.
iv. Per i., ii. and iii., wouldn't it be desirable to have some syntax to access an item without relying on template transclusion?
As i said in i., i think that's a rare use case. However, it's simple enough to provide this ability, see my reply to Nikola.
This would enable us to:
- use data in an article without having to write a template first (solves i.);
Hm, actually - this might be nice for using values from the item inline in the article text. But do we really want to encourage that?
- write templates that can get as many items as they need, either
from the transcluding page or /by themselves/ (solves ii.);
Well, I don't like magic going on in the background, so I don't really like templates fetching item data by themselves.
But if need be, my suggestion to provide a {{#load-data}} function that puts an item into the current scope (be it the page or a template) would solve this too.
- update the existing templates to Wikidata without having to edit
the articles and without visible changes from the template user's perspective — except for what is handled by Wikidata, of course (solves iii.).
This is always possible. Keep the current template as is, move the actual formating logic to another template, make the original template call the {{#data-template}} function.
v. What about lua? :)
Parser functions are available for use from Lua, afaik. Once we know more about how Lua will bind into MediaWiki, we can think about nice shorthands and pretty syntax.
Thanks for your input!
-- daniel
On 22/05/12 16:28, Daniel Kinzler wrote:
Do you think putting a parser function call directly into the article would be considered far worse than a simple template call? I.e. is
{{Infobox}}
really so much better than
{{#data-template:Infobox}}
?
Of course not, however, for example, I believe a major use of Wikidata would be for citation templates, so <ref>{{cite|id=q1234}}</ref><ref>{{cite|id=q2345}}</ref> is better than <ref>{{#data-template:cite|id=q1234}}</ref><ref>{{#data-template:cite|id=q2345}}</ref>.
On 22.05.2012 16:37, Nikola Smolenski wrote:
Of course not, however, for example, I believe a major use of Wikidata would be for citation templates, so <ref>{{cite|id=q1234}}</ref><ref>{{cite|id=q2345}}</ref> is better than <ref>{{#data-template:cite|id=q1234}}</ref><ref>{{#data-template:cite|id=q2345}}</ref>.
The bibliography use case is on our minds, but not in our road map. It's beyond phase 3. So i'm reluctant to spend too much thought on it right now.
However, you can always just wrap another template around {{#data-template:cite|id={{{id}}}}}, respectively just make {{cite}} call {{#data-template:cite-format|id={{{id}}}}}
-- daniel
On 22/05/12 15:49, Daniel Kinzler wrote:
On 22.05.2012 16:37, Nikola Smolenski wrote:
Of course not, however, for example, I believe a major use of Wikidata would be for citation templates, so <ref>{{cite|id=q1234}}</ref><ref>{{cite|id=q2345}}</ref> is better than <ref>{{#data-template:cite|id=q1234}}</ref><ref>{{#data-template:cite|id=q2345}}</ref>.
The bibliography use case is on our minds, but not in our road map. It's beyond phase 3. So i'm reluctant to spend too much thought on it right now.
However, you can always just wrap another template around {{#data-template:cite|id={{{id}}}}}, respectively just make {{cite}} call {{#data-template:cite-format|id={{{id}}}}}
-- daniel
Which is absolutely the right way to do it. In general, I think that access to the semantic layer should always be done in this way: the extra layer of indirection is a form of implementation hiding, allowing the semantic parts of the system to be maintained and redefined without having to change the user interface and thus have to ripple edits through into thousands of articles.
-- Neil
On 22/05/12 16:49, Daniel Kinzler wrote:
On 22.05.2012 16:37, Nikola Smolenski wrote:
Of course not, however, for example, I believe a major use of Wikidata would be for citation templates, so <ref>{{cite|id=q1234}}</ref><ref>{{cite|id=q2345}}</ref> is better than <ref>{{#data-template:cite|id=q1234}}</ref><ref>{{#data-template:cite|id=q2345}}</ref>.
The bibliography use case is on our minds, but not in our road map. It's beyond phase 3. So i'm reluctant to spend too much thought on it right now.
However, you can always just wrap another template around {{#data-template:cite|id={{{id}}}}}, respectively just make {{cite}} call {{#data-template:cite-format|id={{{id}}}}}
If we assume that in practice #data-template is usually going to be wrapped into a template, what's the point of having it at all? Do you see any technical reasons for it?
On 23.05.2012 13:14, Nikola Smolenski wrote:
If we assume that in practice #data-template is usually going to be wrapped into a template, what's the point of having it at all? Do you see any technical reasons for it?
How else do you pass a complex object to a template and make its properties show up as template parameters?
-- daniel
On 23/05/12 13:19, Daniel Kinzler wrote:
On 23.05.2012 13:14, Nikola Smolenski wrote:
If we assume that in practice #data-template is usually going to be wrapped into a template, what's the point of having it at all? Do you see any technical reasons for it?
How else do you pass a complex object to a template and make its properties show up as template parameters?
I don't? The template requests what it needs when it needs it.
Perhaps both approaches could be done, and we will see what people will use in practice.
On 23 May 2012 13:19, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
On 23.05.2012 13:14, Nikola Smolenski wrote:
If we assume that in practice #data-template is usually going to be wrapped into a template, what's the point of having it at all? Do you see any technical reasons for it?
How else do you pass a complex object to a template and make its properties show up as template parameters?
I think I might have adressed that in my comment on the wiki. See there, but essentially I believe it is technically equally valid, and from a usability and community adoption standpoint far preferable, to simply support a syntax to adress properties of the complex object, and have the resolver of this syntax automatically pull the entire complex wikidata object (of which the property is a part) into a cache, so that subsequent calls to properties are returned from the cached object.
I look forward to have this analyzed by Daniel. Obviously there are some extra things that need to be added, but also other things simply go away painlessly... Can you write a advantage/disadvantage comparison on the wiki, Daniel, to be commented upon?
Gregor
Relaying the conversation between gregor and me on https://meta.wikimedia.org/wiki/Talk:Wikidata/Notes/Inclusion_syntax#Various_notes:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In general, I am surprised that on the one hand you seem to be deeply modifying the template parameter calls (by providing structured parameters and methods to resolve the structure) while at the other hand you don't allow the data items to be called from within a template. You present approach seems to force either each page that calls an infobox template to be modified (replacing the {{infoboxZZZ |foo=some value}} with a {{#data-template:infoboxZZZ |foo=some value }} call, or, which is more likely, each infoboxZZZ to be renamed to infoboxZZZ-Inner and a new infoboxZZZ be created that calls the infoboxZZZ-Inner wrapped in the #data-template.
While this is possible to do, it seems a some overhead in the (very likely) scenario that an infobox display a mixture of wikidata-stored information and page-injected information, i.e. both the wrapper, and the inner, real infobox template need to pass the right parameters.
I guess that approach is taken because of caching concerns. However, given that the template parameter calls have to be overloaded for wikidata anyways: is it possible to silently, whenever calling a item.color as a parameter, to always cache the entire item, so that the next call for item.size would already be in memory?
Some random notes on the text, which may or may not be useful:
The explanation is somewhat hard to follow, because the section "Including Items in an Article" requires an understanding of what the object is that is passed to a template. Normally templates do not get structured parameters passed, so this was surprising to me. You invent this newly and a new syntax. Perhaps the explanation of this general mechanism could come first. Like other commentators, I am sceptical about using the dot for this. Both dots and hyphens are legal in the grammar for RDF property names (http://www.w3.org/TR/REC-xml/#NT-Name). Slashes or hashes are not and would be a better choice in my opinion. "This implies that the client wiki tracks": please define "client wiki". Also I cannot follow the rest of the sentence, perhaps elaborate.
--G.Hagedorn (talk) 21:15, 22 May 2012 (UTC)
It was indeed intentional to always do item formatting via a template, since that seems to be the way people usually handle the formatting of uniform data objects. This can easily be amended by introducing a parser function that makes a data object available in the present scope, instead of passing it to a template. As to forcing all pages using the infobox templates to be modified: technically, you don't have to do that, because you can easily wrap the call to #data-template in the original template and use some other template to do the actual formatting. But in practice, the page will have to be edited anyway. It's pointless to use data from Wikidata if we don't remove all the infobox parameters from the article pages. re caching: all data items are cached. Twice, actually: once persistently in a local database table, and once per request, in memory. re your text notes: thanks for the input, I'll improve that. I think I'm going to rewrite the entire proposal, now that I have gotten some feedback. -- Duesentrieb (talk) 15:55, 23 May 2012 (UTC)
Thanks for the response Daniel!
It was indeed intentional to always do item formatting via a template, since that seems to be the way people usually handle the formatting of uniform data objects.
I have no objections about this, I only comment on making the passing of a complex object (the parts of which can then be accessed) into the template the standard way of combining wikipedia templates with wikidata.
This can easily be amended by introducing a parser function that makes a data object available in the present scope, instead of passing it to a template.
To me the present calling convention then become redundant. I believe simpler is better then. Yes, each such call would need some optional parameters in rare cases (like using the non-default object) but that can easily be handled inside the template then.
As to forcing all pages using the infobox templates to be modified: technically, you don't have to do that, because you can easily wrap the call to #data-template in the original template and use some other template to do the actual formatting.
(Yes, thats what I meant with the ...-Inner templates)
But in practice, the page will have to be edited anyway. It's pointless to use data from Wikidata if we don't remove all the infobox parameters from the article pages.
True, I did not think of that. But my question would be: do you want to display to the majority of users a standard template call where some data is locally injected via a standard template parameter syntax and some data is "magically" inherited from wikidata (I believe yes) or do you want to force everyone to learn the new syntax with a yet more strange and complex function call?
If the first, I see only disadvantages in introducing the necessity for a "wrapping" function and nested templates (for which the locally injected parameters have to always synced) at all.
Can we try to develop a scenario where the whole system works through directly accessing properties through what you call (if I understand you correctly) the "parser function that makes a data object available in the present scope, instead of passing it to a template.".
(Note: I am not sure why, if you modify the template parameter resolution syntax to allow access to structured objects, you need a parser function here instead of directly using the syntax -- is there a technical reason that it is not possible to overload template paramter syntax handling in mediawiki outside an outer parser function?)
re caching: all data items are cached. Twice, actually: once persistently in a local database table, and once per request, in memory.
Good. So there should be no need to make the pass the whole-data-item to a nested template function at all. I think. I can guess that there are good reason to think differently, but so far I am stuck to my view :-)
Gregor
On 23 May 2012 17:56, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Relaying the conversation between gregor and me on https://meta.wikimedia.org/wiki/Talk:Wikidata/Notes/Inclusion_syntax#Various_notes:
In general, I am surprised that on the one hand you seem to be deeply modifying the template parameter calls (by providing structured parameters and methods to resolve the structure) while at the other hand you don't allow the data items to be called from within a template. You present approach seems to force either each page that calls an infobox template to be modified (replacing the {{infoboxZZZ |foo=some value}} with a {{#data-template:infoboxZZZ |foo=some value }} call, or, which is more likely, each infoboxZZZ to be renamed to infoboxZZZ-Inner and a new infoboxZZZ be created that calls the infoboxZZZ-Inner wrapped in the #data-template. While this is possible to do, it seems a some overhead in the (very likely) scenario that an infobox display a mixture of wikidata-stored information and page-injected information, i.e. both the wrapper, and the inner, real infobox template need to pass the right parameters. I guess that approach is taken because of caching concerns. However, given that the template parameter calls have to be overloaded for wikidata anyways: is it possible to silently, whenever calling a item.color as a parameter, to always cache the entire item, so that the next call for item.size would already be in memory? Some random notes on the text, which may or may not be useful: The explanation is somewhat hard to follow, because the section "Including Items in an Article" requires an understanding of what the object is that is passed to a template. Normally templates do not get structured parameters passed, so this was surprising to me. You invent this newly and a new syntax. Perhaps the explanation of this general mechanism could come first. Like other commentators, I am sceptical about using the dot for this. Both dots and hyphens are legal in the grammar for RDF property names (http://www.w3.org/TR/REC-xml/#NT-Name). Slashes or hashes are not and would be a better choice in my opinion. "This implies that the client wiki tracks": please define "client wiki". Also I cannot follow the rest of the sentence, perhaps elaborate. --G.Hagedorn (talk) 21:15, 22 May 2012 (UTC) It was indeed intentional to always do item formatting via a template, since that seems to be the way people usually handle the formatting of uniform data objects. This can easily be amended by introducing a parser function that makes a data object available in the present scope, instead of passing it to a template. As to forcing all pages using the infobox templates to be modified: technically, you don't have to do that, because you can easily wrap the call to #data-template in the original template and use some other template to do the actual formatting. But in practice, the page will have to be edited anyway. It's pointless to use data from Wikidata if we don't remove all the infobox parameters from the article pages. re caching: all data items are cached. Twice, actually: once persistently in a local database table, and once per request, in memory. re your text notes: thanks for the input, I'll improve that. I think I'm going to rewrite the entire proposal, now that I have gotten some feedback. -- Duesentrieb (talk) 15:55, 23 May 2012 (UTC) -- Daniel Kinzler, Softwarearchitekt Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin http://wikimedia.de | Tel. (030) 219 158 260 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
On 23.05.2012 23:57, Gregor Hagedorn wrote:
Can we try to develop a scenario where the whole system works through directly accessing properties through what you call (if I understand you correctly) the "parser function that makes a data object available in the present scope, instead of passing it to a template.".
This seems to be everyone's preference, even though it feels kind of icky to me. Oh, well :) I'll rework the draft on that basis soon.
-- daniel
This seems to be everyone's preference, even though it feels kind of icky to me. Oh, well :) I'll rework the draft on that basis soon.
I look forward to it. Maybe it runs against some wall, but then we have a better basis for comparison.
On 22/05/12 16:28, Daniel Kinzler wrote:
What if for some
reason we want a template that accesses several items? Is it reasonable to assume that there will always be an item that links to every needed item for a given template?
In the Wikipedia use case which is the basis for our spec, yes. Because the items are per definition describing the same thing that the Wikipedia page describes.
You might want in wikipedia: John Doe ended up the {{#data-value: item=123|param=Position/2012 Race|format=ordinal}} in the [[2012 Race]] out of {{#data-value:456|param=total-participants}}
Where item 123 could be {"name":"John Doe","Position/2012 Race":42} for instance, and 456 {"name":"2012 Race","country":"Germany","total-participants":1024}
Arguably, we might want to store instead John Doe position in the 2012 Race item.
PS: No magically loaded template parameters, please.
On 22.05.2012 17:25, Platonides wrote:
You might want in wikipedia: John Doe ended up the {{#data-value: item=123|param=Position/2012 Race|format=ordinal}} in the [[2012 Race]] out of {{#data-value:456|param=total-participants}}
It's a possible use case, but as a Wikipedia, I would like to avoid such things in the prose part of an article, to keep the wikitext readable and editable. Anyway, this kind of information either never changes, so the overhead of referencing a data item is pointless, or changes frequently, in which case a table or infobox is more useful for displaying it.
Where item 123 could be {"name":"John Doe","Position/2012 Race":42} for instance, and 456 {"name":"2012 Race","country":"Germany","total-participants":1024}
Arguably, we might want to store instead John Doe position in the 2012 Race item.
This is really a question about the data model, not the inclusion syntax.
Anyway, in my mind, this would be modeled like this:
* There's an item for John Doe, say Q23 * There's an item for the 2012 race, say Q42 * The race has a property "positions" * "positions" is inherently multi-valued (i.e. multiple values don't indicate alternatives, but are considered to be true all at once, and multiple different values can be ascribed to a single source). * Each entry in positions is a reference to a contestor (e.g. Q23) with their rank and perhaps time or whatever as qualifiers.
Alternatively, Q42 could even be about the race in general, and the entries in the "positions" property could be qualified with a point in time. Wikidata editors would be free to choose either solution (or your solution, or whatever).
We (me, Denny and Markus Krötsch) have discussed this kind of "qualified multi-value properties" quite extensively and came to the conclusion that using the pattern above, we can model most scenarios sanely. Another typical example would be ethnic groups of a country, as percentage of the population, by year and source.
-- daniel
On 22/05/12 13:47, Daniel Kinzler wrote:
I have created a first preliminary draft of how data items from the Wikidata repository may be accessed and rendered on the client wiki, e.g. to make infoboxes.
https://meta.wikimedia.org/wiki/Wikidata/Notes/Inclusion_syntax
It would be great if you could have a look and let us know about any unclarities, omissions, or other flaws - and of course about your ideas of how to do this.
As could perhaps be guessed, I have a lot of comments :)
Including Items in an Article:
{{#data-template:Infobox |data_item=q332211 |data_param=stuff |foo=some value |stuff.color=green }}
I see no reason for creating a new parser function for inclusion of templates. Also, this syntax would not allow a template to draw data from more than one item, which would probably be a requirement in phase3.
Rather, I would include a template normally and use a parser function within the template to access the data.
So, instead of {{{data}}} there would be {{#data:}}, instead of {{{data.color}}} there would be {{#data:color}}, instead of {{{data.color(ACME_SURVEY_2010)}}} there would be {{#data:color|ref=ACME_SURVEY_2010}} and so on.
One advantage is that commonly used syntax is always used, instead of inventing new syntax (as for the reference in this example).
Another advantage, this way would make data usable directly in article text, if that is wanted.
The parser function should be able to override itself by template parameters - I believe it is possible to do this.
Unrelated to the above,
"This will return the value of the color property, in the page's content language, as plain text."
I see that there is need to also select desired content language (for example, a lot of infoboxes display name of the topic in the content language and in the topic's language(s)). This has the potential to introduce additional problems, of course.
Formatting Functions:
{{#data-value:data.color}}
form Specifies in what form rendered, that is, in which HTML element the value should be wrapped.
span: wrap in <span> tags, use <span> tags for parts div: wrap in <div> tags, use <span> tags for parts li: wrap in <li> tags, use <span> tags for parts
I don't like this at all, since it limits the number of possibilities, introduces yet another syntax parallel to HTML.
Yet, I don't see anything much better. A half-baked idea is to leave it to the client wikis to create their data display templates that could be used to format data appropriately.
(I see Jérémie's email now, and see that we came independently to some of the same conclusions :) )
{{#data-values}}:
But this is of course an even worse problem.
{{#data-link:data|the data item}}
Why not the usual interwiki syntax of [[wikidata:data|the data item]]?
On 05/22/2012 03:16 PM, Nikola Smolenski wrote:
I see no reason for creating a new parser function for inclusion of templates. Also, this syntax would not allow a template to draw data from more than one item, which would probably be a requirement in phase3.
Rather, I would include a template normally and use a parser function within the template to access the data.
So, instead of {{{data}}} there would be {{#data:}}, instead of {{{data.color}}} there would be {{#data:color}}, instead of {{{data.color(ACME_SURVEY_2010)}}} there would be {{#data:color|ref=ACME_SURVEY_2010}} and so on.
This could even be simplified further to
{{#data:{{{data_item}}}|color}}
The syntax is admittedly longer, but would work as-is with alternate parsers such as Parsoid or a generic parser function API in Lua. It would also preserve referential transparency as far as possible, which is good for finer-grained caching.
The parser function should be able to override itself by template parameters - I believe it is possible to do this.
I'd strongly recommend against any magic like this, as it makes templates even harder to understand and also harder to cache.
Gabriel
On 22.05.2012 21:27, Gabriel Wicke wrote:
So, instead of {{{data}}} there would be {{#data:}}, instead of {{{data.color}}} there would be {{#data:color}}, instead of {{{data.color(ACME_SURVEY_2010)}}} there would be {{#data:color|ref=ACME_SURVEY_2010}} and so on.
This could even be simplified further to
{{#data:{{{data_item}}}|color}}
The syntax is admittedly longer, but would work as-is with alternate parsers such as Parsoid or a generic parser function API in Lua. It would also preserve referential transparency as far as possible, which is good for finer-grained caching.
That's pretty much what I was doing with {{#data-value:data.color}}. Turning that into {{#data-value:{{{data}}}|color}} would actually be fine, though a bit ugly. Perhaps it would even be nice to turn it around: {{#property:color|{{{data}}}}}.
However, I'd still like to support the plain template parameter syntax with pseudo-parameters, e.g. {{{data.color}}} for retrieving the flat wikitext value of the property.
Do you think that's ok? Or does it introduce complications with respect to Lua, etc?
-- daniel
On 05/22/2012 09:47 PM, Daniel Kinzler wrote:
On 22.05.2012 21:27, Gabriel Wicke wrote:
So, instead of {{{data}}} there would be {{#data:}}, instead of {{{data.color}}} there would be {{#data:color}}, instead of {{{data.color(ACME_SURVEY_2010)}}} there would be {{#data:color|ref=ACME_SURVEY_2010}} and so on.
This could even be simplified further to
{{#data:{{{data_item}}}|color}}
The syntax is admittedly longer, but would work as-is with alternate parsers such as Parsoid or a generic parser function API in Lua. It would also preserve referential transparency as far as possible, which is good for finer-grained caching.
That's pretty much what I was doing with {{#data-value:data.color}}. Turning that into {{#data-value:{{{data}}}|color}} would actually be fine, though a bit ugly. Perhaps it would even be nice to turn it around: {{#property:color|{{{data}}}}}.
Sure, both are fine with me. My concern is more about using the plain parser function API without magic globals or custom preprocessor behavior.
However, I'd still like to support the plain template parameter syntax with pseudo-parameters, e.g. {{{data.color}}} for retrieving the flat wikitext value of the property.
Do you think that's ok? Or does it introduce complications with respect to Lua, etc?
{{{data.color}}} would be inaccessible to Lua templates, unless it is expanded as a parameter in a custom preprocessor frame and then passed by value to the Lua template. The custom preprocessor frame code would be specific to the current PHP preprocessor, and incompatible with other parsers.
A parser function call with explicit item parameter on the other hand would work as-is through the generic API- be it from Lua or Parsoid.
Gabriel
On 22.05.2012 22:37, Gabriel Wicke wrote:
{{{data.color}}} would be inaccessible to Lua templates, unless it is expanded as a parameter in a custom preprocessor frame and then passed by value to the Lua template. The custom preprocessor frame code would be specific to the current PHP preprocessor, and incompatible with other parsers.
A parser function call with explicit item parameter on the other hand would work as-is through the generic API- be it from Lua or Parsoid.
even though {{{data}}} wouldn't be text, but a complex data object that needs special handling when evaluating it to wikitext?
-- daniel
On 05/22/2012 11:52 PM, Daniel Kinzler wrote:
On 22.05.2012 22:37, Gabriel Wicke wrote:
{{{data.color}}} would be inaccessible to Lua templates, unless it is expanded as a parameter in a custom preprocessor frame and then passed by value to the Lua template. The custom preprocessor frame code would be specific to the current PHP preprocessor, and incompatible with other parsers.
A parser function call with explicit item parameter on the other hand would work as-is through the generic API- be it from Lua or Parsoid.
even though {{{data}}} wouldn't be text, but a complex data object that needs special handling when evaluating it to wikitext?
{{{data}}} won't be accessible to Lua or Parsoid. {{#data:q12345|somevalue}} on the other hand would be accessible through the regular parser function API.
Gabriel
On 23.05.2012 12:53, Gabriel Wicke wrote:
On 05/22/2012 11:52 PM, Daniel Kinzler wrote:
On 22.05.2012 22:37, Gabriel Wicke wrote:
{{{data.color}}} would be inaccessible to Lua templates, unless it is expanded as a parameter in a custom preprocessor frame and then passed by value to the Lua template. The custom preprocessor frame code would be specific to the current PHP preprocessor, and incompatible with other parsers.
A parser function call with explicit item parameter on the other hand would work as-is through the generic API- be it from Lua or Parsoid.
even though {{{data}}} wouldn't be text, but a complex data object that needs special handling when evaluating it to wikitext?
{{{data}}} won't be accessible to Lua or Parsoid. {{#data:q12345|somevalue}} on the other hand would be accessible through the regular parser function API.
So, you prefer a solution where the item to use is specified by id whenever a property of that item is to be accessed? In that case, I'd indeed prefer
{{#property:population|item=id/q12345}}
or just
{{#property:population}}
when using the page's "own" item.
But if the id comes from a template parameter, things get annoying:
{{#property:population|item={{{item-id|*}}}}}
(where "*" would mean "use the page's default item)
This is very ugly, especially if you have to do it 20 or 50 times (for every property).
A possible solution would be to assign local names to items:
{{#item:thingy|item=id/{{{item-id|*}}}} {{#property:population|item=thingy}}
What do you think?
On 05/23/2012 01:57 PM, Daniel Kinzler wrote:
So, you prefer a solution where the item to use is specified by id whenever a property of that item is to be accessed? In that case, I'd indeed prefer
{{#property:population|item=id/q12345}}
A possible solution would be to assign local names to items:
{{#item:thingy|item=id/{{{item-id|*}}}} {{#property:population|item=thingy}}
What do you think?
I don't really like this global variable business at all. Much of the ugliness above disappears when the id is mandatory:
{{#data:{{id}}|color}}
or (if you prefer):
{{#data:color|{{id}}}}
If it is missing, simply display an error and let the user fix it.
Cache invalidation can be precise by usage (not necessarily the entire page) and correctly handles multiple data items per article. The system is also directly compatible with Lua and Parsoid.
Gabriel
On Wed, May 23, 2012 at 9:33 AM, Gabriel Wicke wicke@wikidev.net wrote:
I don't really like this global variable business at all. Much of the ugliness above disappears when the id is mandatory:
{{#data:{{id}}|color}}
or (if you prefer):
{{#data:color|{{id}}}}
If it is missing, simply display an error and let the user fix it.
Cache invalidation can be precise by usage (not necessarily the entire page) and correctly handles multiple data items per article. The system is also directly compatible with Lua and Parsoid.
...and inside of a Lua module, the ugly things could just be stored in a local variable, coudn't they?
Hey,
Great writeup, after doing a quick read, I agree with most stuff, but have some minor remarks:
{{{data.color(ACME_SURVEY_2010)}}}
This suggests that we do not want property names with brackets in them. Sometimes an item might have 2 distinct properties with the same name (I can't think of any example right now but things this does occur) in which case you need to add some extra stuff in their name to distinguish them. WP often uses brackets for this when it happens with page names, so it does not seem to far fetched that people would want to use that here as well.
{{#data-link:|action=edit|the data item}}
I really don't like providing an empty value to have it use the default. Should be possible to just omit the parameter altogether. Also, I think it's nice to have the arguments be order independent. So using parameter names for everything except the identifier might be good. Right now it's for example impossible to have action=edit or similar at it's start.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
On 22.05.2012 15:47, Jeroen De Dauw wrote:
Hey,
Great writeup, after doing a quick read, I agree with most stuff, but have some minor remarks:
{{{data.color(ACME_SURVEY_2010)}}}
This suggests that we do not want property names with brackets in them. Sometimes an item might have 2 distinct properties with the same name (I can't think of any example right now but things this does occur) in which case you need to add some extra stuff in their name to distinguish them. WP often uses brackets for this when it happens with page names, so it does not seem to far fetched that people would want to use that here as well.
Yea, as I said in my reply to Nikola: it's probably best to just drop that syntax.
{{#data-link:|action=edit|the data item}}
I really don't like providing an empty value to have it use the default. Should be possible to just omit the parameter altogether. Also, I think it's nice to have the arguments be order independent. So using parameter names for everything except the identifier might be good. Right now it's for example impossible to have action=edit or similar at it's start.
Well, that would mean that the link text can't be a positional parameter, so we'd have to use {{#data-link:action=edit|text=the data item}}. A bit un-pretty, but then, this stuff will only show up in templates anyway.
So yea, agreed.
-- daniel
Relying questions TMg posted on the draft's talk page https://meta.wikimedia.org/wiki/Talk:Wikidata/Notes/Inclusion_syntax:
Localization, syntax and more questions
1) Currently we are using the hash syntax for parser functions like {{#if: and {{#expr:. I'm not sure if it's a good idea to use the same syntax for the Wikidata stuff. As you said the output of e.g. {{#data-value: can not be used as an input for e.g. {{#if:. Isn't this confusing? 2) Maybe it would be more confusing to invent a new syntax? 3) Why does data-template use a dash but data_item and data_param are using underscores? Please use dashes everywhere. Be consistent with the HTML5 data-* attributes. 4) Why not using the same syntax inside the templates? For example, {{{data-color}}} instead of {{{data.color}}}? 5) Are we free to use localized template names and parameter names for the new infobox syntax? I consider this very, very important. Here is an example why this is so important. We need a clearly defined point where the parameter names can be translated to create a localized version of the same template. 6) Overall, I'm not sure if the new infobox syntax is meant to be used in articles or in other templates? Can we keep our existing localized templates and use the new syntax in these templates? The new syntax should allow this.
--TMg 14:23, 22 May 2012 (UTC)
Hi TMg, thanks for your input!
1) these are parser functions. And you can use their output as input for other parser functions, they are just riddled with a lot of HTML and so pretty useless as conditions, etc. But I will rephrase the relevant sentence - the output of #data-value etc can be used as the seconds or third parameter to #if just fine, it just doesn't make sense to use it as the first parameter (the condition). 2) yes :) Also much harder to implement. 3) ok, will use slashes instead of underscores in parameter names. 4) {{{data.color}}} is a structured identifier, meaning the property "color" of the object "data". Dots (and colons) are commonly used in programming languages to denote sub-entities (parts, properties, members, etc). in contrast, data-template or data-param are not structured - they are just compound phrases. The do not denote sub-entities. 5) Template names are completely custom. It's not that you can localize them - you will have to provide your own. As to the parameters supported by the parser functions... they will use whatever mechanism exists for localizing the parameter names of parser functions. I don't know if MediaWiki supports it. If MediaWiki supports it, Wikidata/Wikibase supports it. 6) The intention is to use {{#data-template:whatever}} in the article and {{#data-value:data.foo}} in the template. If you want to hide the {{#data-template}} stuff, you can wrap another template around it: {{Whatever}} would do {{#data-template:whatever-format}}, and whatever-format would contain the actual formatting logic.
HTH -- Duesentrieb (talk) 14:44, 22 May 2012 (UTC)
Relying further conversation with TMg on the draft's talk page https://meta.wikimedia.org/wiki/Talk:Wikidata/Notes/Inclusion_syntax:
4 I'm a software developer, I know the dot syntax. However, I'm not sure if it's appropriate here. Wikitext is no programming language, not even with all the parser functions we have. It does not even look like JavaScript or C++. Currently in template parameters like {{{Min.-max. height}}} neither dashes nor dots nor spaces have a meaning. All I say is: If you choose a character why not choose the dash? Again, this would be consistent with the HTML5 data-* attributes. 5 I want to translate {{{data.color}}} to {{{Daten.Farbe}}} (or to {{{Daten-Farbe}}} as argued above). Maybe I'm wrong and this is not important. The question is: What part of the new syntax will be visible in articles? All these parts must be translated. 6 OK. Similar to the /doc, /sandbox and /testcases subpages we will create a lot of /data-template subpages then. I think this is a good idea.
--TMg 15:30, 22 May 2012 (UTC)
4 You asked "If you choose a character why not choose the dash?" well, dashes and underscores are often used in property identifiers. If we used them as our structuring element, they can not occur inside either the name of the parameter that references the item, nor in the name of any property of the item. So, the item can't have e.g. a pupulation-density parameter and, according to your original point, shouldn't be using pupulation_density either (well, we could use dashes as a structuring element and underscores i nthe name of parameters and properties, but you didn't like that and it's visually far more confusing than using dots). Anyway, I'm not desperate to use dots. I just think dashes are worse. We can use slashes, how about it :) 5 Ah, you want to translate the names of the item's properties. We are considering to make this possible in the property definition in the wikidata repository. We'll have to think about restrictions for those names (allow dashes? dots? spaces?), and if and how they can be changed later (changing the localized name would break a lot of things...).
It may save us a lot of trouble to require the use of unchanging unique identifiers to the parameters, so nothing breaks when the translation is changed. We'll have to maintain a localized "visible" name anyway, so we can automatically provide labels for properties (oops, forgot to mention that in the draft).
--Duesentrieb (talk) 15:55, 22 May 2012 (UTC)
Hi everyone:
I have an issue related with permissions and "template syntaxes"
I read in the draft on Meta about data_item
The data_item parameter can be used to specify the data item to include directly by id. Per default, the item associated with the present article via its article links is used. This implies that the client wiki tracks with pages uses which item, to allow the relevant pages to be rerendered when the data item changes in the repository.
Is this data input available for everyone users?, pe: if any user adds or changes this parameter, are there any problem with the link to data parameters?.
Sorry if I misunderstand this parameter, but I'm thinking on sysop's work... only revert vandalism and the consequences of change some linked data :(
Regards.
On Tue, May 22, 2012 at 7:47 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Hello all
I have created a first preliminary draft of how data items from the Wikidata repository may be accessed and rendered on the client wiki, e.g. to make infoboxes.
https://meta.wikimedia.org/wiki/Wikidata/Notes/Inclusion_syntax
It would be great if you could have a look and let us know about any unclarities, omissions, or other flaws - and of course about your ideas of how to do this.
Getting this right is an important part of implementing phase 2 of the Wikidata project, and so I feel it's important to start drafting and discussing early. Having a powerful but not overly complex way to create infoboxes etc from Wikidata items is very important for the acceptance of Wikidata on the clinet wikis, I believe.
Thanks, Daniel
-- Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin http://wikimedia.de | Tel. (030) 219 158 260
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Dennis
On 22.05.2012 16:58, Dennis Tobar wrote:
Hi everyone:
I have an issue related with permissions and "template syntaxes"
I read in the draft on Meta about data_item
[...]
Is this data input available for everyone users?, pe: if any user adds or changes this parameter, are there any problem with the link to data parameters?.
It's just a template parameter. Anyone can change it. Changing it can do whatever to the page, depending on the template and the parameters.
Sorry if I misunderstand this parameter, but I'm thinking on sysop's work... only revert vandalism and the consequences of change some linked data :(
Vandalizing the data_item parameter would be the equivalent of vandalizing the language links or infobox parameters. Easy to detect and revert.
Relying information about changes performed directly in the wikidata repository, so that vandalism there can be detected, is a much more tricky problem... but a different topic.
-- daniel
What I didn't see in the proposal was a way to get data multiple levels deep. Will this be possible?
Use case:
Q100 is data about an association football game. Maybe Q100 has team1-score="3", team2-score="2", referenced by {{{data.team1-score}}} and {{{data.team2-score}}} respectively (or whatever modifications have been made already due to previous discussion). Great.
Q100 also might have links to the teams: team1=[Q200], team2=[Q201]. Q200 has the name of the team: name="Melchester Rovers". Can we get the name of the team? {{{data.team1.name}}}? Some other syntax? Not available?
Bryan Burgers
On 22.05.2012 17:12, Bryan Burgers wrote:
What I didn't see in the proposal was a way to get data multiple levels deep. Will this be possible?
Not directly. Wikibase/Wikidata doesn't provide nested items. Only item references.
Use case:
Q100 is data about an association football game. Maybe Q100 has team1-score="3", team2-score="2", referenced by {{{data.team1-score}}} and {{{data.team2-score}}} respectively (or whatever modifications have been made already due to previous discussion). Great.
ok
Q100 also might have links to the teams: team1=[Q200], team2=[Q201]. Q200 has the name of the team: name="Melchester Rovers". Can we get the name of the team? {{{data.team1.name http://data.team1.name}}}? Some other syntax? Not available?
Not like this. Either, the teams get passed into the templates as separate items, e.g. as data_team_1 and data_team_2. But that would require the called to specify the teams explicitly, even though they are already in the item Q100 about the match.
It could be made nicer if the template is allowed to load additional items into its scope on its own accord, as I discussed in my reply to Nikola earlier:
{{#load-data:{{{data.team1}}}}}
Note that I don't really like the idea, but if we need it to cover real world use cases, then we can easily do this, I think.
The only difficulty here is to decide what exactly {{{data.team1}}} should return in case {{{data.team1}}} is an item reference. In the present use case, it would need to return the item id, while when used "normally" in wikitext, it would be nice if it would return a wiki link to the corresponding local wiki page (if that exists). We may then need to somehow indicate that in this context, we want the actual id, perhaps like this:
{{#load-data:{{{data.team1#id}}}}}
This is not very pretty, though. Maybe it would be cleaner to use something like this:
{{#load-data:{{#data-value:data.team1|form=raw}}}}
-- daniel
The section https://meta.wikimedia.org/w/index.php?title=Wikidata/Notes/Inclusion_syntax... says "{{{data}}}...would evaluate to the label and description of the item... in the **page's content** language". But it should be noted that on some it will be necessary to allow a different language to be used (e.g. on ptwiki some data is written in Portuguese from Brazil and other is in Portuguese from Portugal, Angola, and so on). So it would be necessary to provide a way to specify the appropriate language.
Another thing: will we be able to translate the name "data" (e.g. to "dados") which appear on "{{#data-value:" "{{{data.color}}}", "data_item", "data_param" etc..? In Portuguese and Spanish there is the word "data" with the meaning of the english word "date": http://en.wiktionary.org/wiki/data#Portuguese It could be confusing not being able to translate this...
Would it be possible to create a link having "action=edit" as text? E.g.: {{#data-link:data|action=edit|action=edit}} or {{#data-link:data|action=edit|action{{=}}edit}} ?
Best regards, Helder On Tue, May 22, 2012 at 8:47 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Hello all
I have created a first preliminary draft of how data items from the Wikidata repository may be accessed and rendered on the client wiki, e.g. to make infoboxes.
https://meta.wikimedia.org/wiki/Wikidata/Notes/Inclusion_syntax
It would be great if you could have a look and let us know about any unclarities, omissions, or other flaws - and of course about your ideas of how to do this.
Getting this right is an important part of implementing phase 2 of the Wikidata project, and so I feel it's important to start drafting and discussing early. Having a powerful but not overly complex way to create infoboxes etc from Wikidata items is very important for the acceptance of Wikidata on the clinet wikis, I believe.
Thanks, Daniel
-- Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin http://wikimedia.de | Tel. (030) 219 158 260
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 22.05.2012 17:49, Helder Wiki wrote:
The section https://meta.wikimedia.org/w/index.php?title=Wikidata/Notes/Inclusion_syntax... says "{{{data}}}...would evaluate to the label and description of the item... in the **page's content** language". But it should be noted that on some it will be necessary to allow a different language to be used (e.g. on ptwiki some data is written in Portuguese from Brazil and other is in Portuguese from Portugal, Angola, and so on). So it would be necessary to provide a way to specify the appropriate language.
Yes, of course. {{{data}}} is just a shorthand. If you want e.g. just the description in Dutch, falling back to German, you could use #data-value to get it:
{{#data-value:data.description|language=nl,de|form=plain}}
Another thing: will we be able to translate the name "data" (e.g. to "dados") which appear on "{{#data-value:" "{{{data.color}}}", "data_item", "data_param" etc..?
There are three different things to translate here.
1) the name of the parameter used to pass the item to the template ("data" in the example). You are free so choose it using the data_param option. The default could indeed be localized, though changing that (be editing the respective system message) is likely to break a lot of things in a very hard to understand way.
2) the name of the (well-known) parameters (options) for the proposed parser functions, like data_item and data_param for the #data-template function. Since #data-template is a parser function, it will use MediaWikis standard mechanisms for localizing parser functions. I'm pretty sure the name of the function can be localized, and perhaps also the names of the parameters, I don't know. In any case, this is a general mediawiki issue, not specific to wikidata.
3) the names of item properties, e.g. "color" in the "data.color" example. These are properties defined and described on pages on the wikidata repository, and they have localized names for display. If these localized names can also be used to access the respective property, or if a stable, unique, unlocalized identifier must be used, is still up for discussion. The problem with localized identifiers is - if they change, they break a *lot* of things.
Would it be possible to create a link having "action=edit" as text? E.g.: {{#data-link:data|action=edit|action=edit}} or {{#data-link:data|action=edit|action{{=}}edit}}
This is the general problem of passing a positional parameters that contain a "=" to templates and parser functions. There are two common solutions:
{{#data-link:data|action=edit|2=action=edit}}
and
{{#data-link:data|action=edit|action=edit}}
Regards, Daniel
relaying more conversation between me an TMg on https://meta.wikimedia.org/wiki/Talk:Wikidata/Notes/Inclusion_syntax:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ I don't really understand what all the formatting stuff is about. Why should we use
{{#data-value:data.color|form=span}}
instead of
<span>{{#data-value:data.color}}</span>
Or worse, why should we use
{{#data-value:data.color|form=td|style=text-align: right;}}
instead of the wiki table syntax?
| style="text-align: right;" | {{#data-value:data.color}}
I think you should drop "form", "style" and "class". Simpler is better. --TMg 17:26, 22 May 2012 (UTC)
For the simple access you suggest above, the plain template parameter syntax is provided, e.g. {{{data.color}}}, as in
<span>{{{data.color}}}</span>
but there are many aspects of rendering a property value that can not be readily expressed in this syntax, for example, which language or precision and unit to use for the output. Also, values often have qualifiers, such as the source, accuracy, timestamp, etc. Lastly, there nedds to be a place for things like indicators for disputes, edit links, etc. the {{#data-value}} function lets you output all of these "parts" all at once (or separately, if you like), and lets you control aspects like the format of the output using parameters. If you choose to output multiple parts at once, {{#data-value}} can use its knowledge about the desired HTML form to do this nicely. E.g.
<span>{{#data-value:data.population|show=label,value,timestamp,source,indicators,edit|form=tr}}</span>
would be rendered as
<tr> <td>Population</td> <td>523,411</td> <td>2010</td> <td><a href="#src23">[1]</a>,<a href="#src23">[2]</a></td> <td><a title="disputed" href="..."><img src="..."/></a></td> <td>[<a title="edit" href="...">edit</a>]</td> </tr>
Its hard to imagine how to achieve this nicely without using parser functions. -- Duesentrieb (talk) 19:37, 22 May 2012 (UTC)
On 05/22/2012 09:39 PM, Daniel Kinzler wrote:
the {{#data-value}} function lets you output all of these "parts" all at
once (or separately, if you like), and lets you control aspects like the format of the output using parameters. If you choose to output multiple parts at once, {{#data-value}} can use its knowledge about the desired HTML form to do this nicely. E.g.
<span>{{#data-value:data.population|show=label,value,timestamp,source,indicators,edit|form=tr}}</span>
would be rendered as
<tr> <td>Population</td> <td>523,411</td> <td>2010</td> <td><a href="#src23">[1]</a>,<a href="#src23">[2]</a></td> <td><a title="disputed" href="..."><img src="..."/></a></td> <td>[<a title="edit" href="...">edit</a>]</td> </tr>
Its hard to imagine how to achieve this nicely without using parser
functions. -- Duesentrieb (talk) 19:37, 22 May 2012 (UTC)
An alternative might be to return JSON from the parser function and let a Lua module, some other parser function or even a new templating construct handle the formatting.
Gabriel
On 22.05.2012 22:56, Gabriel Wicke wrote:
Its hard to imagine how to achieve this nicely without using parser
functions. -- Duesentrieb (talk) 19:37, 22 May 2012 (UTC)
An alternative might be to return JSON from the parser function and let a Lua module, some other parser function or even a new templating construct handle the formatting.
Well, {{{data}}} already really *is* that JSON structure. I'm trying to come up with exactly that new templating structure :)
-- daniel
relaying yet another conversation between me an TMg on https://meta.wikimedia.org/wiki/Talk:Wikidata/Notes/Inclusion_syntax:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
I'm very sorry but I think this is way to complicated. I'm a software developer and I think I should be able to understand all this in seconds when I look at it. I think you should create a toolkit that is very tiny and very easy to understand. Extremely powerful tools like in your example with all the complicated parameters (even comma-separated, which I think is horrible) are way to restricted in the end and can be used only in very, very few cases. Here is how your example should work in my opinion:
|- | Population | {{#formatnum: {{{data.population}}} }} | {{#time: {{{data.population.timestamp}}} }} | {{{data.population.source}}} | {{{data.population.indicators}}} | [{{{data.population.edit}}} edit]
This belongs in a template. In an article we will never write {{#data-value:data.population|show=label,value,timestamp,source,indicators,edit|form=tr}}. We will write {{Population table row}} instead. As said before, I don't understand why we should use HTML table syntax in a wiki? There is a table syntax. We know how it works. Don't force us to use an other syntax, please. We have tools to format numbers, timestamps and to create references and links. We have powerfull tools to create templates. We are able to use styles and classes and HTML. We don't need a new syntax to do thinks we already can do. This is not only confusing, it is highly counterproductive. Don't work against the template syntax, work with it. data.population should output the unformatted population. We have tools to format numbers. data.population.timestamp should output an unformatted timestamp. We have tools to format timestamps. data.population.edit should output an URL. We have tools to create links. data.population.source should output <ref> tags. --TMg 09:38, 23 May 2012 (UTC)
Ok, so you want to handle all parts of each value by hand. Fine. It is possible for most things, as you said. But it's very tricky to do in other cases, and very redundant to have to do it over and over. Here's a few things that I can't think of a good way to do using templates:
* Unit conversion. Even if you have templates to do this, you would need a plain number as input. but data.population may not be a single value, but (e.g. in case of a dispute) a list or range of values.
* indicators are generally complex html
* the edit link would normally contain javascript that invokes the on-site editing interface. Only as a fallback would it actually link somewhere. And it should not be formatted as an external link, nor should it end up in the externallinks table.
* data.population.source is actually a list of sources, each of which needs a template for rendering. You are already generating complex html at this point.
* in the case of #data-values (plural), each value (actually, each statement, see the data model spec) for a property would be listed separately. You would need a foreach loop to do this in a template. With Lua, this will be possible in the future, but right now it isn't.
* While i agree that it's not very nice to be outputting entire table rows from a parser function, I think it would very hard to cover the above with a simpler approach. If you can think of a cleaner, nicer, yet workable way, let me know. -- Duesentrieb (talk) 10:19, 23 May 2012 (UTC)
Hi all!
Thank you very much for all your input over the last week. I have now created a new draft, in which I have tried to address you concerns and suggestions:
https://meta.wikimedia.org/wiki/Wikidata/Notes/Inclusion_syntax
The major change is that items are no longer explicitly passed around as data objects, and the use of templates is not required. Instead, item properties are addressed directly, optionally also using the ID of the item to access.
Please have a new draft and let me know what you think!
Thanks Daniel
Thank you Daniel, the new proposal seems to be a great aggregation of the discussion results. I like it a lot.
A few short comments:
* I would drop the Section "Changing the Default item". This is syntactic sugar (meaning, everything that can be done with the #data-item function can also be done without, especially within templates, by using template variables). But it might be tricky as, once the default item is changed within a template, it probably stays changed when we come back from the template, and this might have undesirable side effects. It basically adds a global variable to the page, and we need careful about how using it. This can be avoided by simply dropping the #data-item function.
* The section "Coalescing values" is very interesting, but I think it needs a bit more work and wider discussion. AFAIK usually such values are not coalesced, but rather just listed (which would be covered by the section "Multiple values").
* The selection {{#property:population|item=id/q12345}} is ambiguous with {{#property:population|item=en/Germany}} since "id" is also a site code / language code. My first thought would be to drop the id/ in the first case, but I would prefer if in this case we would default to the current site (i.e., if, on the en.wp we could use {{#property:population|item=Germany}} without the "en/"). The second suggestion thus would be to use {{#property:population|id=q12345}} in the first case).
Thanks for the great work, and keeping the discussion together.
Cheers, Denny
2012/5/29 Daniel Kinzler daniel.kinzler@wikimedia.de
Hi all!
Thank you very much for all your input over the last week. I have now created a new draft, in which I have tried to address you concerns and suggestions:
https://meta.wikimedia.org/wiki/Wikidata/Notes/Inclusion_syntax
The major change is that items are no longer explicitly passed around as data objects, and the use of templates is not required. Instead, item properties are addressed directly, optionally also using the ID of the item to access.
Please have a new draft and let me know what you think!
Thanks Daniel
-- Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin http://wikimedia.de | Tel. (030) 219 158 260
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
On 29.05.2012 12:03, Denny Vrandečić wrote:
- I would drop the Section "Changing the Default item". This is syntactic sugar
(meaning, everything that can be done with the #data-item function can also be done without, especially within templates, by using template variables).
Yes, that's correct. I still think it would be quite useful, but we could makr it as obsolete.
But it might be tricky as, once the default item is changed within a template, it probably stays changed when we come back from the template, and this might have undesirable side effects. It basically adds a global variable to the page, and we need careful about how using it.
No, this is not a problem if implemented as described in the spec: the default item would be changed for the current *scope*, not the page. The scope is basically the current preprocessor frame, which is the mechanism by which "local" things like template parameters are managed. The scope of the "default item" would be the same as the scope for template parameters: the current template call, if any. After returning from the template, the default item would be beck to what it was before invoking the template.
Introducing a global variable into the page would indeed be very problematic. If it's not possible to do this based on the preprocessor frame, it would indeed be better to just drop this feature.
Do you hjave a suggestion how this could be made more clear in the draft?
- The section "Coalescing values" is very interesting, but I think it needs a
bit more work and wider discussion. AFAIK usually such values are not coalesced, but rather just listed (which would be covered by the section "Multiple values").
This would be quite bad, I think. It would mean that you can't use {{#property}}... directly at all, because the behavior of this would simply be undefined in cases where there are multiple equally strong statements. You'd have to use {{#property-values}} with an appropriate "row" template for every property. Very annoying.
You are correct that coalesced values would often be lists in the sense of plain text comma separated list. This would be the default for all data types that can not be combined in a more specific way (e.g. as ranges or using a max or min), for example for text values, sources, etc.
We definitely have to specify what {{#property}} will do if there are multiple equally strong statements associated with the property. That is, some type of coalescing needs to take place, even if it's trivial (e.g. always just use the first statement and ignore the rest).
If we don't use proper coalescing but just silently ignore additional, potentially conflicting statements, we will be hiding disputes and inconsistencies instead of making them obvious. This is especially important in places where multiple values were not expected, so it would be particularly bad to require the use of the rather awkward {{#property-values}} mechanism (or, in Lua, loops) in order to see disagreeing values.
- The selection {{#property:population|item=id/q12345}} is ambiguous with
{{#property:population|item=en/Germany}} since "id" is also a site code / language code. My first thought would be to drop the id/ in the first case, but I would prefer if in this case we would default to the current site (i.e., if, on the en.wp we could use {{#property:population|item=Germany}} without the "en/"). The second suggestion thus would be to use {{#property:population|id=q12345}} in the first case).
You are correct that this is ambiguous. This ties directly into the pending discussion about how to link to items from wikitext. In any case, the syntax to refer to items in {{#property}} calls should be the same we also use to refer to items using the normal interwiki syntax. That is, if we use [[wikidata:id/q12345]] we should also use {{#property|item=id/q12345}}, and if we use [[wikidata:en/Foo}}]] we should also use {{#property|item=en/Foo}}.
For interwiki links (and also for wiki-text links inside Wikidata, for use on discussion pages, etc) we need a single, unambiguous path-like syntax for addressing items. Using different parameters in parser functions and urls (like item vs. id) or similar tricks will not save us. We'll have to use prefixes.
So, since "id" could also be a language code, how about "item/q12345" or " ISO qid/12345" (qid is reserved for local use by ISO 639-3) or even "/q12345" or "*/q12345"?
-- daniel
PS: perhaps the key parts of this discussion should go into the draft as editorial notes?
Thanks a lot Daniel for the update!
I think this is great progress. Picking on the point Denny raised:
With respect to the #data-item function: Even after reading your explanation I share some of Denny's concern. You are certainly right, that at present there is a clear solution. However, is this solution defined to be stable in the light of ongoing restructured of mediawiki? Is "current preprocessor frame" (which does not say anything to me) something that will always be present, even if mediawiki might in the end use a completely new parser?
If yes, then I only see the problem of explaining the scope-definition slgihtly better.
- The section "Coalescing values" is very interesting, but I think it needs a
bit more work and wider discussion. AFAIK usually such values are not coalesced, but rather just listed (which would be covered by the section "Multiple values").
I agree with Daniel that aggregation or enumeration functionality is a must.
I am slightly unhappy with the term Coalescing, because it seems to me misleading. E.g. the SPARQL and other programming language sense covers only part of what seems to be addressed here, see e.g.
http://www.w3.org/TR/sparql11-query/#func-coalesce http://en.wikipedia.org/wiki/Null_coalescing_operator
From what Daniel is describing, "aggregation" (or perhaps specifically
enumeration) would be clearer to me.
I wonder whether it can be restricted to enumeration initially. However, I do support a functionality of directly output multiple values rather than forcing complex template programming recursion etc.
----
My own point, slightly more generic than this discussion:
Whenever you use "data item" or "item data" (both forms exist, with "item data" seemingly being an abbreviation for "data item data" ;-) ), I start to think "what was this?" - the snak, the topic, the property values?
"Topic" seems to be an excellent replacement for data item - with the added benefit of tying into topic maps.
Gregor
PS:
1. Note: you define preferred values, but later you switch to undefined "strong", which may or may not be a synonym. I suggest to drop the notion of "strong" and continue to argue with preferred.
2. Please run a spelling checker, it is distracting to read at the moment. but content goes over form, or course!
Thanks for your comments, Gregor!
On 29.05.2012 15:48, Gregor Hagedorn wrote:
With respect to the #data-item function: Even after reading your explanation I share some of Denny's concern. You are certainly right, that at present there is a clear solution. However, is this solution defined to be stable in the light of ongoing restructured of mediawiki? Is "current preprocessor frame" (which does not say anything to me) something that will always be present, even if mediawiki might in the end use a completely new parser?
If yes, then I only see the problem of explaining the scope-definition slgihtly better.
They may not exist in the current form, but a "local scope" will always exist. Template parameter names are local the the current template "call". They have to be managed somewhere, so mediawiki will always have some way to manage data attached to the "current call", and we can tie into that to put the "default item" into that "current" or "local" scope.
I agree with Daniel that aggregation or enumeration functionality is a must.
I am slightly unhappy with the term Coalescing, because it seems to me misleading. E.g. the SPARQL and other programming language sense covers only part of what seems to be addressed here, see e.g.
Ok, "aggregation" is probably clearer and also matches the use of that term in the context of SQL. Thanks for the suggestion!
From what Daniel is describing, "aggregation" (or perhaps specifically enumeration) would be clearer to me.
I wonder whether it can be restricted to enumeration initially. However, I do support a functionality of directly output multiple values rather than forcing complex template programming recursion etc.
Enumeration should be the default, and can even be the only form of aggregation, though I would really like to have range and min/max too.
My own point, slightly more generic than this discussion:
Whenever you use "data item" or "item data" (both forms exist, with "item data" seemingly being an abbreviation for "data item data" ;-) ), I start to think "what was this?" - the snak, the topic, the property values?
"Topic" seems to be an excellent replacement for data item - with the added benefit of tying into topic maps.
We established the "item" terminology in the data model. The syntax spec has to be consistent with this. We could change "item" to something else, but we'd have to start in the data model.
- Note: you define preferred values, but later you switch to
undefined "strong", which may or may not be a synonym. I suggest to drop the notion of "strong" and continue to argue with preferred.
Yea, maybe that needs some explanation. I'm not using "strong" as such, I'm referring to "equally strong" statements. Two preferred statements are equally strong, but if there are no preferred statements but only two unsourced statements, they are also equally strong and would be combined (aggregated, coalesced).
- Please run a spelling checker, it is distracting to read at the
moment. but content goes over form, or course!
Sorry, will do.
-- daniel
Hi Daniel,
They may not exist in the current form, but a "local scope" will always exist. Template parameter names are local the the current template "call". They have to be managed somewhere, so mediawiki will always have some way to manage data attached to the "current call", and we can tie into that to put the "default item" into that "current" or "local" scope.
I am convinced if you can explain: the scope of this setting will always be identical to template parameters. For example, it will not be inherited if a template calls another template.
My own point, slightly more generic than this discussion:
Whenever you use "data item" or "item data" (both forms exist, with "item data" seemingly being an abbreviation for "data item data" ;-) ), I start to think "what was this?" - the snak, the topic, the property values?
"Topic" seems to be an excellent replacement for data item - with the added benefit of tying into topic maps.
We established the "item" terminology in the data model. The syntax spec has to be consistent with this. We could change "item" to something else, but we'd have to start in the data model.
Yes I know. I just mentioned it because I was experiencing myself being confused here -- just as an example. If you think there a chance to discuss the data model with respect to how the thing is being called, please open a discussion and I will gladly comment. I do respect your constraints in what you can discuss when.
Gregor
On 29.05.2012 17:02, Gregor Hagedorn wrote:
Hi Daniel,
They may not exist in the current form, but a "local scope" will always exist. Template parameter names are local the the current template "call". They have to be managed somewhere, so mediawiki will always have some way to manage data attached to the "current call", and we can tie into that to put the "default item" into that "current" or "local" scope.
I am convinced if you can explain: the scope of this setting will always be identical to template parameters. For example, it will not be inherited if a template calls another template.
Hehe, good catch. I have been thinking about this, and I think that in the case of default items, it would be consistent to let "sub"-templates (nested template calls) inherit the item set by a template further up. I'm not sure how this is currently handled by the preprocessor. It's something to be investigated (maybe we can have a chat with Tim about this in Thursday...)
We established the "item" terminology in the data model. The syntax spec has to be consistent with this. We could change "item" to something else, but we'd have to start in the data model.
Yes I know. I just mentioned it because I was experiencing myself being confused here -- just as an example. If you think there a chance to discuss the data model with respect to how the thing is being called, please open a discussion and I will gladly comment. I do respect your constraints in what you can discuss when.
Please voice your concerns on the data model's talk page, https://meta.wikimedia.org/wiki/Talk:Wikidata/Data_model.
-- daniel
On 05/29/2012 05:12 PM, Daniel Kinzler wrote:
On 29.05.2012 17:02, Gregor Hagedorn wrote:
Hi Daniel,
They may not exist in the current form, but a "local scope" will always exist. Template parameter names are local the the current template "call". They have to be managed somewhere, so mediawiki will always have some way to manage data attached to the "current call", and we can tie into that to put the "default item" into that "current" or "local" scope.
I am convinced if you can explain: the scope of this setting will always be identical to template parameters. For example, it will not be inherited if a template calls another template.
Hehe, good catch. I have been thinking about this, and I think that in the case of default items, it would be consistent to let "sub"-templates (nested template calls) inherit the item set by a template further up. I'm not sure how this is currently handled by the preprocessor. It's something to be investigated (maybe we can have a chat with Tim about this in Thursday...)
Both the PHP preprocessor and Parsoid rely on template expansions being purely functional for a single parser pass (see for example line 118ff in Preprocessor_DOM.php). This means that a full expansion subtree can be reused / cached if the parameters match.
It is possible to add scoped variables, but we'd have to carefully consider the implications for caching, implementation complexity and usability.
Gabriel
On 29.05.2012 18:06, Gabriel Wicke wrote:
On 05/29/2012 05:12 PM, Daniel Kinzler wrote:
On 29.05.2012 17:02, Gregor Hagedorn wrote:
I am convinced if you can explain: the scope of this setting will always be identical to template parameters. For example, it will not be inherited if a template calls another template.
Hehe, good catch. I have been thinking about this, and I think that in the case of default items, it would be consistent to let "sub"-templates (nested template calls) inherit the item set by a template further up. I'm not sure how this is currently handled by the preprocessor. It's something to be investigated (maybe we can have a chat with Tim about this in Thursday...)
Both the PHP preprocessor and Parsoid rely on template expansions being purely functional for a single parser pass (see for example line 118ff in Preprocessor_DOM.php). This means that a full expansion subtree can be reused / cached if the parameters match.
Thanks for providing details here!
This would be the case, since the local "variable" indicating the "local" default item would be set using a parser function call in the template, and would depend solely on the template parameters (or, trivially, use a static item id).
It is possible to add scoped variables, but we'd have to carefully consider the implications for caching, implementation complexity and usability.
Since the scoped var does not depend on any external information besides template parameters, I don't expect problems here.
-- daniel
On 05/29/2012 06:33 PM, Daniel Kinzler wrote:
Both the PHP preprocessor and Parsoid rely on template expansions being purely functional for a single parser pass (see for example line 118ff in Preprocessor_DOM.php). This means that a full expansion subtree can be reused / cached if the parameters match.
Thanks for providing details here!
This would be the case, since the local "variable" indicating the "local" default item would be set using a parser function call in the template, and would depend solely on the template parameters (or, trivially, use a static item id).
This is not true for subtemplates of such a wrapper template, unless you completely disable caching for those or handle expansion yourself.
It is possible to add scoped variables, but we'd have to carefully consider the implications for caching, implementation complexity and usability.
Since the scoped var does not depend on any external information besides template parameters, I don't expect problems here.
One problem I see is that the value of the variable in a local scope would change during execution. Parsoid expands templates and parser functions in parallel, so you would get non-deterministic behavior depending on the order of execution.
Gabriel
Would it not be error prone to have the item setting being inherited in a subtemplate, which, on its own, expects to refer to the default item?
Gregor
On 29.05.2012 18:58, Gregor Hagedorn wrote:
Would it not be error prone to have the item setting being inherited in a subtemplate, which, on its own, expects to refer to the default item?
Well, conceptually, I'd say that a template works on whatever was the default item for the caller - that's actually consistent, not breaking any assumptions. It would be more confusing to have it otherwise, imho.
But as Gabriel points out, this is nasty to implement, because it constitutes "context" apart from the actual parameters.
-- daniel
On 29.05.2012 18:54, Gabriel Wicke wrote:
On 05/29/2012 06:33 PM, Daniel Kinzler wrote:
Both the PHP preprocessor and Parsoid rely on template expansions being purely functional for a single parser pass (see for example line 118ff in Preprocessor_DOM.php). This means that a full expansion subtree can be reused / cached if the parameters match.
Thanks for providing details here!
This would be the case, since the local "variable" indicating the "local" default item would be set using a parser function call in the template, and would depend solely on the template parameters (or, trivially, use a static item id).
This is not true for subtemplates of such a wrapper template, unless you completely disable caching for those or handle expansion yourself.
Ah, of course. Good point, thanks!
It is possible to add scoped variables, but we'd have to carefully consider the implications for caching, implementation complexity and usability.
Since the scoped var does not depend on any external information besides template parameters, I don't expect problems here.
One problem I see is that the value of the variable in a local scope would change during execution. Parsoid expands templates and parser functions in parallel, so you would get non-deterministic behavior depending on the order of execution.
So that basically means the default item should be changed only once, and only at the beginning, of a template, and this would have to be evaluated before evaulating anything else in that template. Hm, that sucks.
-- daniel
https://meta.wikimedia.org/w/index.php?title=Wikidata/Notes/Inclusion_syntax...
the language to prefer for the value, as a fallback list of language codes. Per default, the page's content language is used.
Just to be clear, will the default be the content language **and its fallbacks**?
This is still to be decided. We expect to be back with a draft of a suggestion for this answer within the next month.
2012/5/29 Helder . helder.wiki@gmail.com
https://meta.wikimedia.org/w/index.php?title=Wikidata/Notes/Inclusion_syntax...
the language to prefer for the value, as a fallback list of language
codes. Per default, the page's content language is used. Just to be clear, will the default be the content language **and its fallbacks**?
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l