On Fri, Jan 30, 2009 at 12:55 AM, Brianna Laugher brianna.laugher@gmail.com wrote:
2009/1/30 Johannes Beigel johannes.beigel@pediapress.com:
On 29.01.2009, at 13:48, Brianna Laugher wrote:
On Wikimedia Commons a little bit of work has been done to this end: http://commons.wikimedia.org/wiki/Commons:Commons_API
We've been aware of this page and Magnus' implementation, and we think it looks really good!
The information is (AFAIK) scraped from the rendered XHTML of articles. This could be done in a less error-prone way (and more efficiently) if the data would be stored and accessed via database in some way. Of course this would require some discussion, formal decisions and code changes. But as I stated in an earlier post: I think MediaWiki is so widely used by people who want to share and collaborate on free content, that it's not too farfetched to build some "license infrastracture" into the software itself.
I agree that it makes a lot of sense. But because it would be a big change, I fear that unless the lead developers show great enthusiasm for the idea, it will take a very long time to be accepted and completed. Whereas building an "add-on" tool can be faster to get to point of functionality.
It may be a good idea to try and build the Commons API to mimic the MediaWiki API, imagining that in the future such information will be available via that. So then hopefully for now people could use the Commons API, and in the future switch to the MediaWiki API by just changing the API URL, and all their queries could stay the same.
There is a big conceptual difference between the two APIs, IMHO. The MediaWiki API can be used to query technically defined things: Link lists, categories, template usage and so on. A Commons API (mine or someone elses) parses the content itself for data and relations that are not technically defined.
One way would be to add some kind of license metadata per page into the database. This is possible, but rather specific; also, it would likely mean to create a separate interface just for that.
The better way (IMHO) is to store all used "page:template:parameter:value" tuples in a wiki in a separate database table, which could be queried by the MediaWiki API. This has been suggested time and again by me and others. It would then be much easier for a third-party API to get the relevant data for a page. The functionality is part of Semantic Wikimedia, but would actually scale as a project on its own ;-)
This approach would also aloow for the integration of tools like TemplateTiger [1] directly into Wikipedia.
Magnus
[1] http://toolserver.org/~kolossos/templatetiger/tt-table4.php?template=Persond...