I've made some attempt to map images on Wikimedia commons to
distinct concepts from DBpedia, see
http://ookaboo.com/
This could be useful for forming a training set, but I
haven't yet got around to releasing a public dump of the data. I
have about 1 million things classified and could certainly extend
the strategies used to get more.
Unless there's been a really unprecedented breakthrough, I'd
think that the application of machine vision to Wikimedia faces the
problem of getting enough training data. If you had thousands or
tens of thousands of photos that were labeled 'cat' or 'not cat',
or 'member of plant species X' or 'not member of plant species X',
you can train a classifier to make the distinction. However, if
you've got two or three bad photos of a particular plant (which is
what you have most of the times in Commons) you don't have enough
training data to generalize.
If you've got a specific mission, say genitals recognition, I
think you can make progress, but to attack the general problem you
need to go big with your training sets.