-- とある白い猫 (To Aru Shiroi Neko)

On Mon, Sep 26, 2011 at 19:32, Andre Engels <andreengels@gmail.com> wrote:

On Mon, Sep 26, 2011 at 6:43 PM, Paul Houle <paul@ontology2.com> wrote:

       I've made some attempt to map images on Wikimedia commons to distinct concepts from DBpedia, see

http://ookaboo.com/

      This could be useful for forming a training set, but I haven't yet got around to releasing a public dump of the data. I have about 1 million things classified and could certainly extend the strategies used to get more.

      Unless there's been a really unprecedented breakthrough, I'd think that the application of machine vision to Wikimedia faces the problem of getting enough training data. If you had thousands or tens of thousands of photos that were labeled 'cat' or 'not cat', or 'member of plant species X' or 'not member of plant species X', you can train a classifier to make the distinction. However, if you've got two or three bad photos of a particular plant (which is what you have most of the times in Commons) you don't have enough training data to generalize.

      If you've got a specific mission, say genitals recognition, I think you can make progress, but to attack the general problem you need to go big with your training sets.

Every small category is a part of a big category. A system such as this will not be able to specify plant species, but it might well be able to find pictures of plants. If it then gives a list of plant pictures that are not in some plant category, animal pictures that are not in animal category, buildings that are not in a regional building category, maps that are not in a map of category, paintings that are not in a painter category, famous people that are not in a people category etcetera, it could deliver those to volunteers to further classify.

--
André Engels, andreengels@gmail.com

_______________________________________________
Commons-l mailing list
Commons-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l