On 8/17/07, Magnus Manske magnusmanske@googlemail.com wrote:
Like - this? ;-) http://commons.wikimedia.org/wiki/Image:Oorgatbrug_schematisch.jpg?withJS=Me...
Change that to appear above the image at the top like flickr does, with a few less options, and a prominant orignal. ... and take it live site wide please. Thankx.
I get a couple emails a month now asking for higher resolution versions of my image.. when the higher resolution image was always on commons but the full resolution link was just not obvious enough.
I'm working on that ;-)
I could probably "screenscrape" the categories and find those that look like a license.
Don't bother yet. We should fix the license stuff to be a bit more machine readable (A prefix which will only be used for approved and acceptable licenses).
Anything else would be maddness. If we're not making some effort to keep a uniform interface on the data source side you'll have to hopelessly change a moving target forever. I.e. some random user will decide he wants to use [[Category:By-SA]] rather than [[Category:CC-By-SA]] and you'll miss it. We just need to say something like:
"An image isn't licensed unless there is a transclusion from the license namespace of the form {{License:foo}} directly included in the wikitext, and all pages in the license namespace must be commons compatible community approved licenses which also apply [[Category:Licensed under foo]] style categories."
This would be no real burden on users, and it would make the data much more machine readable. It would also solve other issues, like preventing people from randomly creating acceptable looking but invalid license templates (such as the, long ago fixed, "it came from the Library of Congress, thus it's PD" template).
If we only use categories for license integration we can't slow people from just inventing new ones at will..
If we use a template applied directly to the page, then we could skip the category entirely.. but keeping it is harmless, and can make integration with simple category based search tools simpler.
The direct applied template approach is a real boon for machine reading/editing. It's what Enwp did with the Non-free templates on enwp, and I can point you to a half dozen bot authors who were thankful for the change.
Determining the author might prove harder. There's the upload log (latest uploader if multiple?) and possible screenscraping of the Information template. Tricky.
Help is welcome! :-)
Scrape the text out of the author field. If someone has done something anything which is at all hard to read, like including a template in it's value, or used a wrapper rather than Information directly: just fail to extract the data.
We can go clean it up. Anything else is futle.
Next thing we need is "commons lint". Basically a script that checks an image page for errors (like an inability to extract the author or license data, malformed geocoding, and other machine detectable problems).
We should invoke that script using javascript when someone views the image page on commons. So if you view an image with problem's you'll get some kind of red flag.
We could even go so far as hooking the edit page, so that page save triggers the check script first and interupts saving if the check fails... or at least yells at you a lot. ... I wouldn't want to do that until it was well tested in advisory mode. However.