Automated identification of images on commons

List overview All Threads
Download

newer

older

(no subject)

Re: [Commons-l] [Foundation-l]...

とある白い猫

20 Sep 2011 20 Sep '11

10:01 a.m.

Hi,

I am attending ImageCLEF and they seem to have pretty reliable way to identify content on images automatically. For instance they can identify if an image has a specific species of Plant based on the leaf type or if the image has a UK flag and etc.

This can even be used to detect image vandalism to for example quickly catch genitalia uploads and quickly tag them for review I am thinking.

Any thoughts on the matter?

-- とある白い猫 (To Aru Shiroi Neko)

Attachments:

attachment.htm (text/html — 824 bytes)

Show replies by date

John Vandenberg

20 Sep 20 Sep

11:10 a.m.

It would be great to have automatic categorisation of new uploads.

-- John Vandenberg

とある白い猫

12:05 p.m.

Indeed, I was thinking of looking/reviewing the existing 11 million files as well as we currently have far too many images for human-only review IMHO. Is there anything you'd want me to ask at the conference?

-- とある白い猫 (To Aru Shiroi Neko)

On Tue, Sep 20, 2011 at 13:10, John Vandenberg jayvdb@gmail.com wrote:

...

It would be great to have automatic categorisation of new uploads.

-- John Vandenberg

Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

Federico Leva (Nemo)

1:02 p.m.

とある白い猫, 20/09/2011 14:05:

...

Indeed, I was thinking of looking/reviewing the existing 11 million files as well as we currently have far too many images for human-only review IMHO. Is there anything you'd want me to ask at the conference?

Yes, is it free software and is there a way to use Commons the other way round, ie as a tool to build (or improve) such a software, using the already categorised images?

Nemo

とある白い猫

1:36 p.m.

The conference has no such tool yet, at least nothing we can use tomorrow but they are able to pretty accurately on what the images are. I am going to try to propose if they would be interested in providing commons with such a service, The website relevant is http://www.imageclef.org/2011/Wikipedia for commons but http://www.imageclef.org/2011/Plants is also interesting (even though it had nothing to do with commons so far. It could ver well be used for commons and wikispecies alike.

-- とある白い猫 (To Aru Shiroi Neko)

On Tue, Sep 20, 2011 at 15:02, Federico Leva (Nemo) nemowiki@gmail.comwrote:

...

とある白い猫, 20/09/2011 14:05:

...
Indeed, I was thinking of looking/reviewing the existing 11 million files as well as we currently have far too many images for human-only review IMHO. Is there anything you'd want me to ask at the conference?

Yes, is it free software and is there a way to use Commons the other way round, ie as a tool to build (or improve) such a software, using the already categorised images?

Nemo

Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

Paul Houle

26 Sep 26 Sep

4:43 p.m.

On 9/20/2011 9:36 AM, とある白い猫 wrote:

...

The conference has no such tool yet, at least nothing we can use tomorrow but they are able to pretty accurately on what the images are. I am going to try to propose if they would be interested in providing commons with such a service, The website relevant is http://www.imageclef.org/2011/Wikipedia for commons but http://www.imageclef.org/2011/Plants is also interesting (even though it had nothing to do with commons so far. It could ver well be used for commons and wikispecies alike.

-- とある白い猫 (To Aru Shiroi Neko)

I've made some attempt to map images on Wikimedia commons to distinct concepts from DBpedia, see

http://ookaboo.com/

This could be useful for forming a training set, but I haven't yet got around to releasing a public dump of the data. I have about 1 million things classified and could certainly extend the strategies used to get more.

Unless there's been a really unprecedented breakthrough, I'd think that the application of machine vision to Wikimedia faces the problem of getting enough training data. If you had thousands or tens of thousands of photos that were labeled 'cat' or 'not cat', or 'member of plant species X' or 'not member of plant species X', you can train a classifier to make the distinction. However, if you've got two or three bad photos of a particular plant (which is what you have most of the times in Commons) you don't have enough training data to generalize.

If you've got a specific mission, say genitals recognition, I think you can make progress, but to attack the general problem you need to go big with your training sets.

Andre Engels

5:32 p.m.

On Mon, Sep 26, 2011 at 6:43 PM, Paul Houle paul@ontology2.com wrote:

...

** I've made some attempt to map images on Wikimedia commons to distinct concepts from DBpedia, see

http://ookaboo.com/
  This could be useful for forming a training set,  but I haven't yet
got around to releasing a public dump of the data. I have about 1 million things classified and could certainly extend the strategies used to get more.
  Unless there's been a really unprecedented breakthrough,  I'd think
that the application of machine vision to Wikimedia faces the problem of getting enough training data. If you had thousands or tens of thousands of photos that were labeled 'cat' or 'not cat', or 'member of plant species X' or 'not member of plant species X', you can train a classifier to make the distinction. However, if you've got two or three bad photos of a particular plant (which is what you have most of the times in Commons) you don't have enough training data to generalize.
  If you've got a specific mission,  say genitals recognition, I think
you can make progress, but to attack the general problem you need to go big with your training sets.

Every small category is a part of a big category. A system such as this will not be able to specify plant species, but it might well be able to find pictures of plants. If it then gives a list of plant pictures that are not in some plant category, animal pictures that are not in animal category, buildings that are not in a regional building category, maps that are not in a map of category, paintings that are not in a painter category, famous people that are not in a people category etcetera, it could deliver those to volunteers to further classify.

-- André Engels, andreengels@gmail.com

とある白い猫

5:40 p.m.

AI's capabilities are limited. They can with great accuracy identify leaf types. A specialist could work onward from that. Consider this applied to completely uncategorized images.

Mind you I am at a phase of proposing a task for ImageClef which is why I have asked about this here. Examples I mentioned are to give an idea about the capability of the current state of ImageClef contributors. What would community want more as a goal of automated tagging?

-- とある白い猫 (To Aru Shiroi Neko)

On Mon, Sep 26, 2011 at 19:32, Andre Engels andreengels@gmail.com wrote:

...

On Mon, Sep 26, 2011 at 6:43 PM, Paul Houle paul@ontology2.com wrote:

...
** I've made some attempt to map images on Wikimedia commons to distinct concepts from DBpedia, see

http://ookaboo.com/
  This could be useful for forming a training set,  but I haven't yet
got around to releasing a public dump of the data. I have about 1 million things classified and could certainly extend the strategies used to get more.
  Unless there's been a really unprecedented breakthrough,  I'd think
that the application of machine vision to Wikimedia faces the problem of getting enough training data. If you had thousands or tens of thousands of photos that were labeled 'cat' or 'not cat', or 'member of plant species X' or 'not member of plant species X', you can train a classifier to make the distinction. However, if you've got two or three bad photos of a particular plant (which is what you have most of the times in Commons) you don't have enough training data to generalize.
  If you've got a specific mission,  say genitals recognition, I think
you can make progress, but to attack the general problem you need to go big with your training sets.
Every small category is a part of a big category. A system such as this will not be able to specify plant species, but it might well be able to find pictures of plants. If it then gives a list of plant pictures that are not in some plant category, animal pictures that are not in animal category, buildings that are not in a regional building category, maps that are not in a map of category, paintings that are not in a painter category, famous people that are not in a people category etcetera, it could deliver those to volunteers to further classify.

-- André Engels, andreengels@gmail.com

Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

Paul Houle

6:39 p.m.

On 9/26/2011 1:32 PM, Andre Engels wrote:

...

Every small category is a part of a big category. A system such as this will not be able to specify plant species, but it might well be able to find pictures of plants. If it then gives a list of plant pictures that are not in some plant category, animal pictures that are not in animal category, buildings that are not in a regional building category, maps that are not in a map of category, paintings that are not in a painter category, famous people that are not in a people category etcetera, it could deliver those to volunteers to further classify.

The ImageCLEF people are getting such great results because they're working in a specific and limited domain

http://www.imageclef.org/2011/Plants

The goal is to identify plants by looking at their leaves. An obvious application is to build this into a mobile device -- you could snap a picture of a leaf on a plant or remove a leaf and photograph it against a good surface and it tells you what sort of plant you're looking at. This would be a great tool for any gardener's toolbox, anywhere on earth.

I'm looking forward to seeing people solve more problems like this.

とある白い猫

27 Sep 27 Sep

12:06 a.m.

Certainly it is their first attempt in plant identification. They are dealing with very few species of plant as well. Such phone apps could eventually be very useful as for example you point your phone at... anything and it links to the wikipedia article - in the language setting of your phone! Consider this labs progress in the next few decades... To them we are just another image repository but unlike other image repositories we are seeking to tag images in less specialized manner. For instance an image repository for birds would generally avoid tagging birds by color but instead by species or some other scientific categorization, commons on the other hand wouldn't mind such categorization.

As I said before, people shouldn't expect perfect species identification overnight but perhaps this could be the result of research for the next few decades. They currently are considering dropping commons because they do not see any use for commons, we could show them potential fields of research using commons by basically telling them about our problems. For instance dealing with images we frequently delete (over copyright, trolling and etc) could easily be a task for them.

-- とある白い猫 (To Aru Shiroi Neko)

On Mon, Sep 26, 2011 at 20:39, Paul Houle paul@ontology2.com wrote:

...

On 9/26/2011 1:32 PM, Andre Engels wrote:

...
Every small category is a part of a big category. A system such as this will not be able to specify plant species, but it might well be able to find pictures of plants. If it then gives a list of plant pictures that are not in some plant category, animal pictures that are not in animal category, buildings that are not in a regional building category, maps that are not in a map of category, paintings that are not in a painter category, famous people that are not in a people category etcetera, it could deliver those to volunteers to further classify.
   The ImageCLEF people are getting such great results because
they're working in a specific and limited domain

http://www.imageclef.org/2011/Plants
   The goal is to identify plants by looking at their leaves.  An
obvious application is to build this into a mobile device -- you could snap a picture of a leaf on a plant or remove a leaf and photograph it against a good surface and it tells you what sort of plant you're looking at. This would be a great tool for any gardener's toolbox, anywhere on earth.
  I'm looking forward to seeing people solve more problems like this.
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

Gnangarra

1:16 a.m.

Commons works to have all plant photos identified by species and these link to WP articles they would fnid commons as a better fit by being the database for their software...From our POV a focus on copyright issues would be a more valuable tool

2011/9/27 とある白い猫 to.aru.shiroi.neko@gmail.com

...

Certainly it is their first attempt in plant identification. They are dealing with very few species of plant as well. Such phone apps could eventually be very useful as for example you point your phone at... anything and it links to the wikipedia article - in the language setting of your phone! Consider this labs progress in the next few decades... To them we are just another image repository but unlike other image repositories we are seeking to tag images in less specialized manner. For instance an image repository for birds would generally avoid tagging birds by color but instead by species or some other scientific categorization, commons on the other hand wouldn't mind such categorization.

...

As I said before, people shouldn't expect perfect species identification overnight but perhaps this could be the result of research for the next few decades. They currently are considering dropping commons because they do not see any use for commons, we could show them potential fields of research using commons by basically telling them about our problems. For instance dealing with images we frequently delete (over copyright, trolling and etc) could easily be a task for them.

-- とある白い猫 (To Aru Shiroi Neko)

On Mon, Sep 26, 2011 at 20:39, Paul Houle paul@ontology2.com wrote:

...
On 9/26/2011 1:32 PM, Andre Engels wrote:

...
Every small category is a part of a big category. A system such as this will not be able to specify plant species, but it might well be able to find pictures of plants. If it then gives a list of plant pictures that are not in some plant category, animal pictures that are not in animal category, buildings that are not in a regional building category, maps that are not in a map of category, paintings that are not in a painter category, famous people that are not in a people category etcetera, it could deliver those to volunteers to further classify.
   The ImageCLEF people are getting such great results because
they're working in a specific and limited domain

http://www.imageclef.org/2011/Plants
   The goal is to identify plants by looking at their leaves.  An
obvious application is to build this into a mobile device -- you could snap a picture of a leaf on a plant or remove a leaf and photograph it against a good surface and it tells you what sort of plant you're looking at. This would be a great tool for any gardener's toolbox, anywhere on earth.
  I'm looking forward to seeing people solve more problems like this.
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

-- GN. Photo Gallery: http://gnangarra.redbubble.com Gn. Blogg: http://gnangarra.wordpress.com

とある白い猫

1:18 p.m.

I am fully aware how complicated botany is. But it is much easier for a botanist to make sense of it if for example they get a list of leaf images that have their features pre-identified rather than an unsorted list of leafs (or even unsorted lists of all kinds of images of which most is unrelated to botany). This is to narrow down their work, not replace them altogether.

For a learning set they need no less than 6 images per type and wikispecies often only has one...

-- とある白い猫 (To Aru Shiroi Neko)

2011/9/27 Gnangarra gnangarra@gmail.com

...

Commons works to have all plant photos identified by species and these link to WP articles they would fnid commons as a better fit by being the database for their software...From our POV a focus on copyright issues would be a more valuable tool

2011/9/27 とある白い猫 to.aru.shiroi.neko@gmail.com

...
Certainly it is their first attempt in plant identification. They are dealing with very few species of plant as well. Such phone apps could eventually be very useful as for example you point your phone at... anything and it links to the wikipedia article - in the language setting of your phone! Consider this labs progress in the next few decades... To them we are just another image repository but unlike other image repositories we are seeking to tag images in less specialized manner. For instance an image repository for birds would generally avoid tagging birds by color but instead by species or some other scientific categorization, commons on the other hand wouldn't mind such categorization.

...
As I said before, people shouldn't expect perfect species identification overnight but perhaps this could be the result of research for the next few decades. They currently are considering dropping commons because they do not see any use for commons, we could show them potential fields of research using commons by basically telling them about our problems. For instance dealing with images we frequently delete (over copyright, trolling and etc) could easily be a task for them.

-- とある白い猫 (To Aru Shiroi Neko)

On Mon, Sep 26, 2011 at 20:39, Paul Houle paul@ontology2.com wrote:

...
On 9/26/2011 1:32 PM, Andre Engels wrote:

...
Every small category is a part of a big category. A system such as this will not be able to specify plant species, but it might well be able to find pictures of plants. If it then gives a list of plant pictures that are not in some plant category, animal pictures that are not in animal category, buildings that are not in a regional building category, maps that are not in a map of category, paintings that are not in a painter category, famous people that are not in a people category etcetera, it could deliver those to volunteers to further classify.
   The ImageCLEF people are getting such great results because
they're working in a specific and limited domain

http://www.imageclef.org/2011/Plants
   The goal is to identify plants by looking at their leaves.  An
obvious application is to build this into a mobile device -- you could snap a picture of a leaf on a plant or remove a leaf and photograph it against a good surface and it tells you what sort of plant you're looking at. This would be a great tool for any gardener's toolbox, anywhere on earth.
  I'm looking forward to seeing people solve more problems like this.
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
-- GN. Photo Gallery: http://gnangarra.redbubble.com Gn. Blogg: http://gnangarra.wordpress.com

Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

Paul Houle

1:55 p.m.

On 9/26/2011 8:06 PM, とある白い猫 wrote:

...

Certainly it is their first attempt in plant identification. They are dealing with very few species of plant as well. Such phone apps could eventually be very useful as for example you point your phone at... anything and it links to the wikipedia article - in the language setting of your phone! Consider this labs progress in the next few decades... To them we are just another image repository but unlike other image repositories we are seeking to tag images in less specialized manner. For instance an image repository for birds would generally avoid tagging birds by color but instead by species or some other scientific categorization, commons on the other hand wouldn't mind such categorization.

As I said before, people shouldn't expect perfect species identification overnight but perhaps this could be the result of research for the next few decades. They currently are considering dropping commons because they do not see any use for commons, we could show them potential fields of research using commons by basically telling them about our problems. For instance dealing with images we frequently delete (over copyright, trolling and etc) could easily be a task for them.

-- とある白い猫 (To Aru Shiroi Neko)

I was talking with my wife about this, who is a bit of a botanist. Tree leaves are often easy, but a wider identification of plants requires inspection of the flowers and other details. And it can be hard; I've seen professors absolutely baffled by weeds with a dominating presence in the environment. My wife and I spent a week in the Dominican Republic, where we found 30 different species of weeds -- all of which are endemic across the tropical world... We were able to identify fewer than half of them and did worse with the grasses.

Using machine vision or not, the world could use the mobile app that has an expert system that helps people identify plants -- I'd think that databases like DBpedia would be a good place to start.

Now, if you want to use commons images in a machine vision project, you need to find something specific for which enough information is available. You're going to need about 10,000 images for training in test purposes. That can be anything from big class to maybe 100 classes with 100 examples each. You probably can extract something like his out of Commons using semantic information from DBpedia and Freebase.

Gnangarra

12:53 a.m.

On 27 September 2011 02:39, Paul Houle paul@ontology2.com wrote:

...

On 9/26/2011 1:32 PM, Andre Engels wrote:

...
Every small category is a part of a big category. A system such as this will not be able to specify plant species, but it might well be able to find pictures of plants. If it then gives a list of plant pictures that are not in some plant category, animal pictures that are not in animal category, buildings that are not in a regional building category, maps that are not in a map of category, paintings that are not in a painter category, famous people that are not in a people category etcetera, it could deliver those to volunteers to further classify.
   The ImageCLEF people are getting such great results because
they're working in a specific and limited domain

http://www.imageclef.org/2011/Plants
   The goal is to identify plants by looking at their leaves.  An
obvious application is to build this into a mobile device -- you could snap a picture of a leaf on a plant or remove a leaf and photograph it against a good surface and it tells you what sort of plant you're looking at. This would be a great tool for any gardener's toolbox, anywhere on earth.

I be cautious of species indentification base solely on leaf structure, while I dont doubt the skills of the people doing the work nor that they arent being meticulious but any botanist will tell you that species identification relies on multiple factors including plant structure, flowering details etc as an exampe just look at the 350+ species of grevillea http://en.wikipedia.org/wiki/Grevillea I'd be very adverse to using such a tool for identification of species in an encyclopeadic context. It may be work for genus level but even there we'd be finding false positives.

...

  I'm looking forward to seeing people solve more problems like this.
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

-- GN. Photo Gallery: http://gnangarra.redbubble.com Gn. Blogg: http://gnangarra.wordpress.com

とある白い猫

26 Sep 26 Sep

6:02 p.m.

What I observed people do was almost magic at ImageClef. I'll look through your training set, I was however thinking of using existing galleries as a means to identify content.

-- とある白い猫 (To Aru Shiroi Neko)

On Mon, Sep 26, 2011 at 18:43, Paul Houle paul@ontology2.com wrote:

...

** On 9/20/2011 9:36 AM, とある白い猫 wrote:

The conference has no such tool yet, at least nothing we can use tomorrow but they are able to pretty accurately on what the images are. I am going to try to propose if they would be interested in providing commons with such a service, The website relevant is http://www.imageclef.org/2011/Wikipedia for commons but http://www.imageclef.org/2011/Plants is also interesting (even though it had nothing to do with commons so far. It could ver well be used for commons and wikispecies alike.

-- とある白い猫 (To Aru Shiroi Neko)
    I've made some attempt to map images on Wikimedia commons to
distinct concepts from DBpedia, see

http://ookaboo.com/
  This could be useful for forming a training set,  but I haven't yet
got around to releasing a public dump of the data. I have about 1 million things classified and could certainly extend the strategies used to get more.
  Unless there's been a really unprecedented breakthrough,  I'd think
that the application of machine vision to Wikimedia faces the problem of getting enough training data. If you had thousands or tens of thousands of photos that were labeled 'cat' or 'not cat', or 'member of plant species X' or 'not member of plant species X', you can train a classifier to make the distinction. However, if you've got two or three bad photos of a particular plant (which is what you have most of the times in Commons) you don't have enough training data to generalize.
  If you've got a specific mission,  say genitals recognition, I think
you can make progress, but to attack the general problem you need to go big with your training sets.

Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

4843

Age (days ago)

4850

Last active (days ago)

commons-l@lists.wikimedia.org

14 comments

6 participants

tags (0)

participants (6)

Andre Engels
Federico Leva (Nemo)
Gnangarra
John Vandenberg
Paul Houle
とある白い猫