[Commons-l] [Wikitech-l] License information (was: PDF/Collection feature live on de.wikibooks)

Fri Jan 30 16:42:28 UTC 2009

On Fri, Jan 30, 2009 at 12:55 AM, Brianna Laugher
<brianna.laugher at gmail.com> wrote:
> 2009/1/30 Johannes Beigel <johannes.beigel at pediapress.com>:
>> On 29.01.2009, at 13:48, Brianna Laugher wrote:
>>  > On Wikimedia Commons a little bit of work has been done to this end:
>>  > <http://commons.wikimedia.org/wiki/Commons:Commons_API>
>>
>> We've been aware of this page and Magnus' implementation, and we think
>> it looks really good!
>>
>> The information is (AFAIK) scraped from the rendered XHTML of
>> articles. This could be done in a less error-prone way (and more
>> efficiently) if the data would be stored and accessed via database in
>> some way. Of course this would require some discussion, formal
>> decisions and code changes. But as I stated in an earlier post: I
>> think MediaWiki is so widely used by people who want to share and
>> collaborate on free content, that it's not too farfetched to build
>> some "license infrastracture" into the software itself.
>
> I agree that it makes a lot of sense. But because it would be a big
> change, I fear that unless the lead developers show great enthusiasm
> for the idea, it will take a very long time to be accepted and
> completed. Whereas building an "add-on" tool can be faster to get to
> point of functionality.
>
> It may be a good idea to try and build the Commons API to mimic the
> MediaWiki API, imagining that in the future such information will be
> available via that. So then hopefully for now people could use the
> Commons API, and in the future switch to the MediaWiki API by just
> changing the API URL, and all their queries could stay the same.

There is a big conceptual difference between the two APIs, IMHO. The
MediaWiki API can be used to query technically defined things: Link
lists, categories, template usage and so on. A Commons API (mine or
someone elses) parses the content itself for data and relations that
are not technically defined.

One way would be to add some kind of license metadata per page into
the database. This is possible, but rather specific; also, it would
likely mean to create a separate interface just for that.

The better way (IMHO) is to store all used
"page:template:parameter:value" tuples in a wiki in a separate
database table, which could be queried by the MediaWiki API. This has
been suggested time and again by me and others. It would then be much
easier for a third-party API to get the relevant data for a page. The
functionality is part of Semantic Wikimedia, but would actually scale
as a project on its own ;-)

This approach would also aloow for the integration of tools like
TemplateTiger [1] directly into Wikipedia.

Magnus

[1] http://toolserver.org/~kolossos/templatetiger/tt-table4.php?template=Persondata&lang=en&where=&is=