Re: [Commons-l] Blog entry about finding free images

13 Oct 2011

Sorry, just want to clarify:

It would be easy to get images into Google Image Search.

It would be hard to get correct licenses into Google Image Search, given 
the current situation. We'd need to do some serious rethinking on our end.

On 10/13/11 1:58 PM, Neil Kandalgaonkar wrote:
...
  On 10/13/11 1:35 PM, Rayson Ho wrote:
  On Thu, Oct 13, 2011 at 3:56 PM, Neil
 Kandalgaonkar&lt;neilk(a)wikimedia.org&gt; wrote: 
  What exactly needs to be done?? 
 1) Figure out some scheme whereby the actual license is available in the
 database, not merely expressed in human-readable HTML. This is hard, so
 it's where I gave up. Timo (Krinkle) and Roan Kattouw were working on
 this for a bit but they were pulled off to do other things. To do this
 right we'd create a new namespace like License: and then connect that to
 some database entity. We could then connect those to existing license
 templates.

 2) Once that's done, from a daily cronjob or something, generate
 Sitemaps (a summary of all our content) compatible with the Google Image
 Search extended syntax.

 http://www.google.com/support/webmasters/bin/answer.py?answer=178636

 A less organized dump from my brain here:
 http://www.mediawiki.org/wiki/User:NeilK/Sitemaps

  Can't Google just parse the licensing
 section to decide if the image is under CC or not?? 
 Google *can* do any number of things. However, they will probably not do
 any custom development work for Commons.

 By 2011 standards, Commons is a relatively small image repository.
 Flickr has billions of images, and it's not even the most popular photo
 host. Facebook, although inaccessible to Google, adds several billion
 images to its repository *per week*.

 Commons may have some of the "best of the web" images for illustration
 purposes, so it is a high-value thing for Google to crawl. So yeah, it's
 enough for them to assign a guy to talk to me every few months or so.
 But not enough that they will assign developers. They wouldn't have even
 bothered pinging us if that were the case.

 Commons has no real way to communicate licenses to Google. Templates
 create human-readable HTML, not machine-parseable legal information. If
 someone edited the CC master template tomorrow to look a bit prettier,
 anything that was trying to parse licenses from HTML would break.

 Google has a standard for us to tell them the license, in the extended
 Sitemap syntax for images, linked to above. That's what we should do,
 because it would make that information available to Google, and
 potentially to any other search engines that can read that standard.

  If Google can find more of my 400+ images, then
they can be used by
 others more often... that would certainly make me work harder to take
 more photos& upload more to Commons! 
 Hell yes!

-- 
Neil Kandalgaonkar (|  &lt;neilk(a)wikimedia.org&gt;

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Commons-l] Blog entry about finding free images