On Sun, Mar 30, 2008 at 4:10 PM, Brianna Laugher brianna.laugher@gmail.com wrote:
Hi,
There is an interesting Firefox extension called Zemanta, that works with some blogging platforms, to suggest images to match a blog post you type. One of the sources they use is Commons. See this post (comments) for a description of how it works and what it's lacking: http://brianna.modernthings.org/article/97/zemanta-wikimedia-commons-for-bloggers
In particular, "If you have an idea how to correctly capture wikipedia images attribution (something that would assure at least 50% correct coverage from 2.8M images), please help us! ;)"
Really, we can't blame people too much for not providing attribution, when we don't give that information in a standard way, or give a standard way of accessing it.
Now is as good a time as any to formally write an API to recommend for other people to use. Aside from the MediaWiki API, there are three main things I can think of that are often needed to be automated:
- identify any "problem tags" (files with deletion markers shouldn't
be used or indexed by third parties)
- extract license name(s) and URL for a given file
- extract author attribution string for a given file
So I propose we put our heads together and figure out the most robust algorithm for each of these, and provide some sample code for each.
I made a start here:
http://commons.wikimedia.org/wiki/Commons:API
Contributions and feedback welcome...
cheers, Brianna
-- They've just been waiting in a mountain for the right moment: http://modernthings.org/
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
I already started something some time ago on http://commons.wikimedia.org/wiki/Commons:Machine_readability. It allows you to extract all information provided by the {{Information}} template and some other templates. It's not yet finished; I'm still think what is the easiest way to fetch license information.
Bryan