[Commons-l] Alternate information templates Was: Image description grammar (was: the great {{information}} campaign)

Fri Sep 7 11:17:51 UTC 2007

Gregory Maxwell wrote:
> On 9/6/07, Platonides wrote:
>> Agree. We don't need to have ONE template on ALL images, we can (and
>> should) have a number of templates, as long as it's documented. Ie. we
>> have a page listing all of "valid" templates and describing its
>> arguments. If a bot knows that Information_Louvre->source is equivalent
>> to Information->Author it can happily work with any of them being
>> present. Just keep it documented (and a working parsing implementation).
>>
>> Another example are PD books templates. They have everything about the
>> image "Page X from book Y, by Foo on Year on public domain". Here the
>> source & author values for the template would be hardcoded.
> 
> The problem that comes up is that people just constantly invent new
> templates often with trivial differences like hard-coded sources,
> authorship, or licensing information.
If changes are trivial, they should be merged.

> These are especially bad cases
> because when it's stuffed into the template it is as though it isn't
> provided at all.. until someone goes through and special-cases that
> template. 
The bots can alerts us of that.

> Eventually we'll end up with 10million images and 1 million
> templates, one for each source.. just because our uploading tools suck
> and people are abusing templates to avoid retyping source or licensing
> info. :-/
"You can't use this home-made template, as it's not listed on 
[[Commons:The_ultimate_information]]. Also if you had gone to add it 
there you would have found there're already 3 templates using the same, 
Evil-bot-which-dislikes-templates is substituting it. Have a nice day."

> It's utterly unacceptable to expect any tools to keep up with that.
It's unacceptable to expect *all* tools to keep up. But a working 
framework could be provided ;)

> The way I see it, there are three possible ways for a bot to get meta
> information about an image from a template:
> 1. From the wiki text
> 2. From the rendered HTML
> 3. From some future to-be-automatically-generated
> page:template:variable_key:value data set
> 
> #1 is hard/impossible to do correctly (though it might work in many
> cases), as only the MediaWiki parser can parse this stuff correctly
> (mor or less...).
> #2 is correct (since it was done by the MediaWiki parser), but slow.
> #3 IMHO is the only long-term solution. I have proposed this several
> times, on several lists. Last thing I heard, semantic wikipedia will
> take care of it. As soon as it get installed, on Commons...

You're right about #1. But we don't need a full parser, only a basic 
one. More or less like braceSubstitution, omitting al formatting (maybe 
not completely ignore wikilinks).
#2 helps with templates including other templates, but you need to tag 
the sections with html classes (couldn't we have another xml namespace 
added for this?). It's slow.

> Most of the fields in information are common to virtually every image
> why should someone have to support 40 different ways of reading the
> same three or four basic pieces of information which are common to all
> images? Why should the same basic three or four fields have a
> different presentation randomly on some images?

Ideally, that page would be in a meta-language allowing the bots to 
learn what the template arguments are before starting to parse.
In the short term, the "translation" would be manual and hardcoded.

The first doubt it comes to me is. What are the basic fields needed?