[Commons-l] Towards a Commons API

Fri Aug 17 20:33:17 UTC 2007

On 8/17/07, Magnus Manske <magnusmanske at googlemail.com> wrote:
> Like - this? ;-)
>http://commons.wikimedia.org/wiki/Image:Oorgatbrug_schematisch.jpg?withJS=MediaWiki:ChooseResolution.js

Change that to appear above the image at the top like flickr does,
with a few less options, and a prominant orignal. ... and take it live
site wide please. Thankx.

I get a couple emails a month now asking for higher resolution
versions of my image.. when the higher resolution image was always on
commons but the full resolution link was just not obvious enough.

> I'm working on that ;-)
>
> I could probably "screenscrape" the categories and find those that
> look like a license.

Don't bother yet. We should fix the license stuff to be a bit more
machine readable (A prefix which will only be used for approved and
acceptable licenses).

Anything else would be maddness. If we're not making some effort to
keep a uniform interface on the data source side you'll have to
hopelessly change a moving target forever.  I.e. some random user will
decide he wants to use [[Category:By-SA]] rather than
[[Category:CC-By-SA]] and you'll miss it. We just need to say
something like:

"An image isn't licensed unless there is a transclusion from the
license namespace of the form {{License:foo}} directly included in the
wikitext, and all pages in the license namespace must be commons
compatible community approved licenses which also apply
[[Category:Licensed under foo]] style categories."

This would be no real burden on users, and it would make the data much
more machine readable.  It would also solve other issues, like
preventing people from randomly creating acceptable looking but
invalid license templates (such as the, long ago fixed, "it came from
the Library of Congress, thus it's PD" template).

If we only use categories for license integration we can't slow people
from just inventing new ones at will..

If we use a template applied directly to the page, then we could skip
the category entirely.. but keeping it is harmless, and can make
integration with simple category based search tools simpler.

The direct applied template approach is a real boon for machine
reading/editing. It's what Enwp did with the Non-free templates on
enwp, and I can point you to a half dozen bot authors who were
thankful for the change.

> Determining the author might prove harder. There's the upload log
> (latest uploader if multiple?) and possible screenscraping of the
> Information template. Tricky.
>
> Help is welcome! :-)

Scrape the text out of the author field.
If someone has done something anything which is at all hard to read,
like including a template in it's value, or used a wrapper rather than
Information directly: just fail to extract the data.

We can go clean it up. Anything else is futle.

Next thing we need is "commons lint".  Basically a script that checks
an image page for errors (like an inability to extract the author or
license data, malformed geocoding, and other machine detectable
problems).

We should invoke that script using javascript when someone views the
image page on commons. So if you view an image with problem's you'll
get some kind of red flag.

We could even go so far as hooking the edit page, so that page save
triggers the check script first and interupts saving if the check
fails... or at least yells at you a lot.  ... I wouldn't want to do
that until it was well tested in advisory mode. However.