Gregory Maxwell wrote:
On 9/6/07, Platonides wrote:
Agree. We don't need to have ONE template on
ALL images, we can (and
should) have a number of templates, as long as it's documented. Ie. we
have a page listing all of "valid" templates and describing its
arguments. If a bot knows that Information_Louvre->source is equivalent
to Information->Author it can happily work with any of them being
present. Just keep it documented (and a working parsing implementation).
Another example are PD books templates. They have everything about the
image "Page X from book Y, by Foo on Year on public domain". Here the
source & author values for the template would be hardcoded.
The problem that comes up is that people just constantly invent new
templates often with trivial differences like hard-coded sources,
authorship, or licensing information.
If changes are trivial, they should be
merged.
These are especially bad cases
because when it's stuffed into the template it is as though it isn't
provided at all.. until someone goes through and special-cases that
template.
The bots can alerts us of that.
Eventually we'll end up with 10million images and
1 million
templates, one for each source.. just because our uploading tools suck
and people are abusing templates to avoid retyping source or licensing
info. :-/
"You can't use this home-made template, as it's not listed
on
[[Commons:The_ultimate_information]]. Also if you had gone to add it
there you would have found there're already 3 templates using the same,
Evil-bot-which-dislikes-templates is substituting it. Have a nice day."
It's utterly unacceptable to expect any tools to
keep up with that.
It's unacceptable to expect *all* tools to keep up. But a
working
framework could be provided ;)
The way I see it, there are three possible ways for a
bot to get meta
information about an image from a template:
1. From the wiki text
2. From the rendered HTML
3. From some future to-be-automatically-generated
page:template:variable_key:value data set
#1 is hard/impossible to do correctly (though it might work in many
cases), as only the MediaWiki parser can parse this stuff correctly
(mor or less...).
#2 is correct (since it was done by the MediaWiki parser), but slow.
#3 IMHO is the only long-term solution. I have proposed this several
times, on several lists. Last thing I heard, semantic wikipedia will
take care of it. As soon as it get installed, on Commons...
You're right about #1. But we don't need a full parser, only a basic
one. More or less like braceSubstitution, omitting al formatting (maybe
not completely ignore wikilinks).
#2 helps with templates including other templates, but you need to tag
the sections with html classes (couldn't we have another xml namespace
added for this?). It's slow.
Most of the fields in information are common to
virtually every image
why should someone have to support 40 different ways of reading the
same three or four basic pieces of information which are common to all
images? Why should the same basic three or four fields have a
different presentation randomly on some images?
Ideally, that page would be in a meta-language allowing the bots to
learn what the template arguments are before starting to parse.
In the short term, the "translation" would be manual and hardcoded.
The first doubt it comes to me is. What are the basic fields needed?