Finding images

List overview All Threads
Download

newer

older

Change tag params

Upgrading to 1.23

Lars Aronsson

18 Jun 2014 18 Jun '14

9:13 a.m.

Why is it still, now in 2014, so hard to find images? We have categories and descriptions, but we also know they don't describe all that we want to find in an image. If I need an image with a bicycle and some red flowers, I can only go to the category:bicycles and hope that I'm lucky when browsing through the first 700 images there. Most likely, the category will be subdivided by country or in some other useless way that will make my search harder.

Where is science? Google was created in 1998, based on its Pagerank algorithm for web pages filled with words and links. That was 14 years ago. But what algorithms are there for finding images?

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Show replies by date

Tim Starling

18 Jun 18 Jun

9:46 a.m.

On 18/06/14 11:13, Lars Aronsson wrote:

...

Why is it still, now in 2014, so hard to find images? We have categories and descriptions, but we also know they don't describe all that we want to find in an image. If I need an image with a bicycle and some red flowers, I can only go to the category:bicycles and hope that I'm lucky when browsing through the first 700 images there. Most likely, the category will be subdivided by country or in some other useless way that will make my search harder.

Where is science? Google was created in 1998, based on its Pagerank algorithm for web pages filled with words and links. That was 14 years ago. But what algorithms are there for finding images?

How do the commercial stock agencies do it? They have a much more similar problem to Commons than Google does.

-- Tim Starling

Gerard Meijssen

10:16 a.m.

Hoi, Now that ONLY indicates that stock agencies have a similar problem to Commons, it does not help finding images or indicates a path we could take to improve things.

When images are gaining tags as part of the Wikidatification of multi mediafiles we at least have a way to add multi lingual support and, that does improve things on what we have today. Thanks, GerardM

On 18 June 2014 03:46, Tim Starling tstarling@wikimedia.org wrote:

...

On 18/06/14 11:13, Lars Aronsson wrote:

...
Why is it still, now in 2014, so hard to find images? We have categories and descriptions, but we also know they don't describe all that we want to find in an image. If I need an image with a bicycle and some red flowers, I can only go to the category:bicycles and hope that I'm lucky when browsing through the first 700 images there. Most likely, the category will be subdivided by country or in some other useless way that will make my search harder.

Where is science? Google was created in 1998, based on its Pagerank algorithm for web pages filled with words and links. That was 14 years ago. But what algorithms are there for finding images?

How do the commercial stock agencies do it? They have a much more similar problem to Commons than Google does.

-- Tim Starling

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Rayson Ho

10:37 a.m.

On Tue, Jun 17, 2014 at 9:13 PM, Lars Aronsson lars@aronsson.se wrote:

...

Why is it still, now in 2014, so hard to find images? We have categories and descriptions, but we also know they don't describe all that we want to find in an image. If I need an image with a bicycle and some red flowers, I can only go to the category:bicycles and hope that I'm lucky when browsing through the first 700 images there. Most likely, the category will be subdivided by country or in some other useless way that will make my search harder.

Four years ago I requested the "Wikimedia Category Flattening" feature:

http://marc.info/?l=wikitech-l&m=126525308906767

Fast forward back to 2014 and with an additional 1000 high resolution files uploaded to wikimedia (over 95% of my photos are released into the public domain -- it's more "free" than the iStock editorial license), that feature is still not done. IMO, a better search function for Wikimedia Commons would be way more useful than the WYSIWYG editor for Wikipedia!

Rayson

================================================== Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html

...

Where is science? Google was created in 1998, based on its Pagerank algorithm for web pages filled with words and links. That was 14 years ago. But what algorithms are there for finding images?

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

MZMcBride

12:14 p.m.

Lars Aronsson wrote:

...

Why is it still, now in 2014, so hard to find images? We have categories and descriptions, but we also know they don't describe all that we want to find in an image. If I need an image with a bicycle and some red flowers, I can only go to the category:bicycles and hope that I'm lucky when browsing through the first 700 images there. Most likely, the category will be subdivided by country or in some other useless way that will make my search harder.

Where is science? Google was created in 1998, based on its Pagerank algorithm for web pages filled with words and links. That was 14 years ago. But what algorithms are there for finding images?

Hi.

Have you tried Special:Search? :-)

There's a very nice category of red flowers: https://commons.wikimedia.org/wiki/Category:Red_flowers.

If you search for 'incategory:"Red flowers"', you can find pictures in only that category. If you search for 'incategory:"Red flowers" incategory:"Bicycles"', you can see the intersection of these two categories. (No results currently, alas.) Try a search such as 'incategory:"Red flowers" incategory:"Cosmos atrosanguineus"' to see the search actually work (it should return one result currently, 'File:Cosmos atrosanguineus "Choco Mocha".jpg').

Hope that helps.

MZMcBride

Lars Aronsson

1:39 p.m.

On 06/18/2014 06:14 AM, MZMcBride wrote:

...

If you search for 'incategory:"Red flowers"', you can find pictures in only that category. If you search for 'incategory:"Red flowers" incategory:"Bicycles"', you can see the intersection of these two categories. (No results currently, alas.)

This requires that the interesting images have been categorized as having red flowers. I could just as well hope that the description text mentions red flowers, and do a full text search. Both will fail, because this detailed level of categorization/description is lacking.

Even though this picture is categorized as "fruit vendors", it isn't categorized as apples, bananas, cherries, peaches, and pears, or paper crates, or string, or mostly shadow with a little sunshine on a sidewalk. With 21 million files, how can we reach that level of detail in documentation? https://commons.wikimedia.org/wiki/File:Still_Life_with_Fruit-Laden_Bike_-_M...

Here's a bicycle with red flowers, now categorized, https://commons.wikimedia.org/wiki/File:Xe_%C4%91%E1%BA%A1p_ch%E1%BB%9F_h%C3...

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

MZMcBride

1:54 p.m.

Lars Aronsson wrote:

...

On 06/18/2014 06:14 AM, MZMcBride wrote:

...
If you search for 'incategory:"Red flowers"', you can find pictures in only that category. If you search for 'incategory:"Red flowers" incategory:"Bicycles"', you can see the intersection of these two categories. (No results currently, alas.)

This requires that the interesting images have been categorized as having red flowers. I could just as well hope that the description text mentions red flowers, and do a full text search. Both will fail, because this detailed level of categorization/description is lacking.

This doesn't sound like a technical problem to me... can't you just add the relevant categories? It's a wiki, after all.

Perhaps you're hoping for automatic image recognition? I don't think computing, as a science, is there yet. I think I read something about Google and videos of cats, but even the billionaires can't solve this problem, yet. Sorry.

...

Even though this picture is categorized as "fruit vendors", it isn't categorized as apples, bananas, cherries, peaches, and pears, or paper crates, or string, or mostly shadow with a little sunshine on a sidewalk. With 21 million files, how can we reach that level of detail in documentation? [...]

Click edit. Actually, Commons has HotCat enabled, so you can just click the (+) link, I imagine. What's the issue?

...

Here's a bicycle with red flowers, now categorized, [...]

Cool, thanks for that.

MZMcBride

Pine W

2:14 p.m.

Machine vision is definitely getting better with time. We have computer-driven airplanes, computer-driven cars, and computer-driven spacecraft. The computers need us less and less as hardware and software improve. I think it may be less than a decade before machine vision is good enough to categorize most objects in photographs.

Pine

Kristian Kankainen

3:12 p.m.

Hello!

I think, if one is clever enough, some categorization could be automated allready.

Searching for pictures based on meta-data is called "Concept Based Image Retrieval", searching based on the machine vision recognized content of the image is called "Content Based Image Retrieval".

What I understood of Lars' request, is an automated way of finding the "superfluous" concepts or meta-data for pictures based on their content. Of course recognizing an images content is very hard (and subjective), but I think it would be possible for many of these "superfluous" categories, such as "winter landscape", "summer beach" and perhaps also "red flowers" and "bicycle".

There exist today many open source "Content Based Image Retrieval" systems, that I understand basically works in the way that you give them a picture, and they find you the "matching" pictures accompanied with a score. Now suppose we show a picture with known content (pictures from Commons with good meta-data), then we could to a degree of trust find pictures with overlapping categories. I am not sure whether this kind of automated reverse meta-data labelling should be done for only one category per time, or if some kind of "category bundles" work better. Probably adjectives and items should be compounded (eg "red flowers").

Relevant articles and links from Wikipedia: # https://en.wikipedia.org/wiki/Image_retrieval # https://en.wikipedia.org/wiki/Content-based_image_retrieval # https://en.wikipedia.org/wiki/List_of_CBIR_engines#CBIR_research_projects.2F...

Best wishes Kristian Kankainen

18.06.2014 09:14, Pine W kirjutas:

...

Machine vision is definitely getting better with time. We have computer-driven airplanes, computer-driven cars, and computer-driven spacecraft. The computers need us less and less as hardware and software improve. I think it may be less than a decade before machine vision is good enough to categorize most objects in photographs.

Pine _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Pine W

3:20 p.m.

That sounds like a good idea for an experiment. Any volunteers?

Pine

On Wed, Jun 18, 2014 at 12:12 AM, Kristian Kankainen kristian@eki.ee wrote:

...

Hello!

I think, if one is clever enough, some categorization could be automated allready.

Searching for pictures based on meta-data is called "Concept Based Image Retrieval", searching based on the machine vision recognized content of the image is called "Content Based Image Retrieval".

What I understood of Lars' request, is an automated way of finding the "superfluous" concepts or meta-data for pictures based on their content. Of course recognizing an images content is very hard (and subjective), but I think it would be possible for many of these "superfluous" categories, such as "winter landscape", "summer beach" and perhaps also "red flowers" and "bicycle".

There exist today many open source "Content Based Image Retrieval" systems, that I understand basically works in the way that you give them a picture, and they find you the "matching" pictures accompanied with a score. Now suppose we show a picture with known content (pictures from Commons with good meta-data), then we could to a degree of trust find pictures with overlapping categories. I am not sure whether this kind of automated reverse meta-data labelling should be done for only one category per time, or if some kind of "category bundles" work better. Probably adjectives and items should be compounded (eg "red flowers").

Relevant articles and links from Wikipedia: # https://en.wikipedia.org/wiki/Image_retrieval # https://en.wikipedia.org/wiki/Content-based_image_retrieval # https://en.wikipedia.org/wiki/List_of_CBIR_engines#CBIR_ research_projects.2Fdemos.2Fopen_source_projects

Best wishes Kristian Kankainen

18.06.2014 09:14, Pine W kirjutas:

...
Machine vision is definitely getting better with time. We have computer-driven airplanes, computer-driven cars, and computer-driven spacecraft. The computers need us less and less as hardware and software improve. I think it may be less than a decade before machine vision is good enough to categorize most objects in photographs.

Pine _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Kristian Kankainen

3:31 p.m.

I have no time but for brainstorming and such kind of advicing/helping, if any volunteers would need this kind of help.

Kristian

18.06.2014 10:20, Pine W kirjutas:

...

That sounds like a good idea for an experiment. Any volunteers?

Pine

On Wed, Jun 18, 2014 at 12:12 AM, Kristian Kankainen kristian@eki.ee wrote:

...
Hello!

I think, if one is clever enough, some categorization could be automated allready.

Searching for pictures based on meta-data is called "Concept Based Image Retrieval", searching based on the machine vision recognized content of the image is called "Content Based Image Retrieval".

What I understood of Lars' request, is an automated way of finding the "superfluous" concepts or meta-data for pictures based on their content. Of course recognizing an images content is very hard (and subjective), but I think it would be possible for many of these "superfluous" categories, such as "winter landscape", "summer beach" and perhaps also "red flowers" and "bicycle".

There exist today many open source "Content Based Image Retrieval" systems, that I understand basically works in the way that you give them a picture, and they find you the "matching" pictures accompanied with a score. Now suppose we show a picture with known content (pictures from Commons with good meta-data), then we could to a degree of trust find pictures with overlapping categories. I am not sure whether this kind of automated reverse meta-data labelling should be done for only one category per time, or if some kind of "category bundles" work better. Probably adjectives and items should be compounded (eg "red flowers").

Relevant articles and links from Wikipedia: # https://en.wikipedia.org/wiki/Image_retrieval # https://en.wikipedia.org/wiki/Content-based_image_retrieval # https://en.wikipedia.org/wiki/List_of_CBIR_engines#CBIR_ research_projects.2Fdemos.2Fopen_source_projects

Best wishes Kristian Kankainen

18.06.2014 09:14, Pine W kirjutas:

...
Machine vision is definitely getting better with time. We have computer-driven airplanes, computer-driven cars, and computer-driven spacecraft. The computers need us less and less as hardware and software improve. I think it may be less than a decade before machine vision is good enough to categorize most objects in photographs.

Pine _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Federico Leva (Nemo)

4:40 p.m.

Kristian, it's not impossible to find a mentee for such an idea. Please edit: https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Categor... .

While we wait for The Perfect Wikidata Heaven Solution (TM), it's worth expertimenting with.

Nemo

Gerard Meijssen

4:54 p.m.

Hoi, The "perfect Wikidata heaven solution" will be as imperfect as the "perfect Wikipedia heaven solution". Lets stay down to earth and go for something that works most of the time and does not take forever to realise. Thanks, GerardM

On 18 June 2014 10:40, Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

Kristian, it's not impossible to find a mentee for such an idea. Please edit: https://www.mediawiki.org/wiki/Mentorship_programs/ Possible_projects#Category_suggestions .

While we wait for The Perfect Wikidata Heaven Solution (TM), it's worth expertimenting with.

Nemo

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Gerard Meijssen

4:52 p.m.

Hoi, As long as our categories are English, they are useless for all of those who do not speak English. Even so, as long as the current technology is used for those categories it is a trial to find images at all. Many people have given up. Thanks, GerardM

On 18 June 2014 09:12, Kristian Kankainen kristian@eki.ee wrote:

...

Hello!

I think, if one is clever enough, some categorization could be automated allready.

Searching for pictures based on meta-data is called "Concept Based Image Retrieval", searching based on the machine vision recognized content of the image is called "Content Based Image Retrieval".

What I understood of Lars' request, is an automated way of finding the "superfluous" concepts or meta-data for pictures based on their content. Of course recognizing an images content is very hard (and subjective), but I think it would be possible for many of these "superfluous" categories, such as "winter landscape", "summer beach" and perhaps also "red flowers" and "bicycle".

There exist today many open source "Content Based Image Retrieval" systems, that I understand basically works in the way that you give them a picture, and they find you the "matching" pictures accompanied with a score. Now suppose we show a picture with known content (pictures from Commons with good meta-data), then we could to a degree of trust find pictures with overlapping categories. I am not sure whether this kind of automated reverse meta-data labelling should be done for only one category per time, or if some kind of "category bundles" work better. Probably adjectives and items should be compounded (eg "red flowers").

Relevant articles and links from Wikipedia: # https://en.wikipedia.org/wiki/Image_retrieval # https://en.wikipedia.org/wiki/Content-based_image_retrieval # https://en.wikipedia.org/wiki/List_of_CBIR_engines#CBIR_ research_projects.2Fdemos.2Fopen_source_projects

Best wishes Kristian Kankainen

18.06.2014 09:14, Pine W kirjutas:

Machine vision is definitely getting better with time. We have

...
computer-driven airplanes, computer-driven cars, and computer-driven spacecraft. The computers need us less and less as hardware and software improve. I think it may be less than a decade before machine vision is good enough to categorize most objects in photographs.

Pine _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Kristian Kankainen

6:25 p.m.

Could not the categories' language links be useful here? Otherwise BabelNet[1] has set up different ways to connect concepts in different languages into a semantic network. They call it a multilingual encyclopedic dictionary and compile it by combining data from the Wikipedia(s) and WordNet. It's quite clever but still easy. This is still english-centric -- as in having english in the centre of a hub-and-spoke modelled dictionary -- but it does make it _translatable_, which I think is enough for this feature.

Kristian Kankainen

[1] http://babelnet.org/

18.06.2014 11:52, Gerard Meijssen kirjutas:

...

Hoi, As long as our categories are English, they are useless for all of those who do not speak English. Even so, as long as the current technology is used for those categories it is a trial to find images at all. Many people have given up. Thanks, GerardM

On 18 June 2014 09:12, Kristian Kankainen kristian@eki.ee wrote:

...
Hello!

I think, if one is clever enough, some categorization could be automated allready.

Searching for pictures based on meta-data is called "Concept Based Image Retrieval", searching based on the machine vision recognized content of the image is called "Content Based Image Retrieval".

What I understood of Lars' request, is an automated way of finding the "superfluous" concepts or meta-data for pictures based on their content. Of course recognizing an images content is very hard (and subjective), but I think it would be possible for many of these "superfluous" categories, such as "winter landscape", "summer beach" and perhaps also "red flowers" and "bicycle".

There exist today many open source "Content Based Image Retrieval" systems, that I understand basically works in the way that you give them a picture, and they find you the "matching" pictures accompanied with a score. Now suppose we show a picture with known content (pictures from Commons with good meta-data), then we could to a degree of trust find pictures with overlapping categories. I am not sure whether this kind of automated reverse meta-data labelling should be done for only one category per time, or if some kind of "category bundles" work better. Probably adjectives and items should be compounded (eg "red flowers").

Relevant articles and links from Wikipedia: # https://en.wikipedia.org/wiki/Image_retrieval # https://en.wikipedia.org/wiki/Content-based_image_retrieval # https://en.wikipedia.org/wiki/List_of_CBIR_engines#CBIR_ research_projects.2Fdemos.2Fopen_source_projects

Best wishes Kristian Kankainen

18.06.2014 09:14, Pine W kirjutas:

Machine vision is definitely getting better with time. We have

...
computer-driven airplanes, computer-driven cars, and computer-driven spacecraft. The computers need us less and less as hardware and software improve. I think it may be less than a decade before machine vision is good enough to categorize most objects in photographs.

Pine _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Gerard Meijssen

7:51 p.m.

Hoi, My understanding of English is adequate. I find it however really hard to find the pictures I seek. I have given up using Commons for illustrations on my blog for instance. Thanks, GerardM

On 18 June 2014 12:25, Kristian Kankainen kristian@eki.ee wrote:

...

Could not the categories' language links be useful here? Otherwise BabelNet[1] has set up different ways to connect concepts in different languages into a semantic network. They call it a multilingual encyclopedic dictionary and compile it by combining data from the Wikipedia(s) and WordNet. It's quite clever but still easy. This is still english-centric -- as in having english in the centre of a hub-and-spoke modelled dictionary -- but it does make it _translatable_, which I think is enough for this feature.

Kristian Kankainen

[1] http://babelnet.org/

18.06.2014 11:52, Gerard Meijssen kirjutas:

Hoi,

...
As long as our categories are English, they are useless for all of those who do not speak English. Even so, as long as the current technology is used for those categories it is a trial to find images at all. Many people have given up. Thanks, GerardM

On 18 June 2014 09:12, Kristian Kankainen kristian@eki.ee wrote:

Hello!

...
I think, if one is clever enough, some categorization could be automated allready.

Searching for pictures based on meta-data is called "Concept Based Image Retrieval", searching based on the machine vision recognized content of the image is called "Content Based Image Retrieval".

What I understood of Lars' request, is an automated way of finding the "superfluous" concepts or meta-data for pictures based on their content. Of course recognizing an images content is very hard (and subjective), but I think it would be possible for many of these "superfluous" categories, such as "winter landscape", "summer beach" and perhaps also "red flowers" and "bicycle".

There exist today many open source "Content Based Image Retrieval" systems, that I understand basically works in the way that you give them a picture, and they find you the "matching" pictures accompanied with a score. Now suppose we show a picture with known content (pictures from Commons with good meta-data), then we could to a degree of trust find pictures with overlapping categories. I am not sure whether this kind of automated reverse meta-data labelling should be done for only one category per time, or if some kind of "category bundles" work better. Probably adjectives and items should be compounded (eg "red flowers").

Relevant articles and links from Wikipedia: # https://en.wikipedia.org/wiki/Image_retrieval # https://en.wikipedia.org/wiki/Content-based_image_retrieval # https://en.wikipedia.org/wiki/List_of_CBIR_engines#CBIR_ research_projects.2Fdemos.2Fopen_source_projects

Best wishes Kristian Kankainen

18.06.2014 09:14, Pine W kirjutas:

Machine vision is definitely getting better with time. We have

...
computer-driven airplanes, computer-driven cars, and computer-driven spacecraft. The computers need us less and less as hardware and software improve. I think it may be less than a decade before machine vision is good enough to categorize most objects in photographs.

Pine _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Brian Wolff

19 Jun 19 Jun

2:28 a.m.

On 6/18/14, Kristian Kankainen kristian@eki.ee wrote:

...

Hello!

I think, if one is clever enough, some categorization could be automated allready.

Searching for pictures based on meta-data is called "Concept Based Image Retrieval", searching based on the machine vision recognized content of the image is called "Content Based Image Retrieval".

What I understood of Lars' request, is an automated way of finding the "superfluous" concepts or meta-data for pictures based on their content. Of course recognizing an images content is very hard (and subjective), but I think it would be possible for many of these "superfluous" categories, such as "winter landscape", "summer beach" and perhaps also "red flowers" and "bicycle".

There exist today many open source "Content Based Image Retrieval" systems, that I understand basically works in the way that you give them a picture, and they find you the "matching" pictures accompanied with a score. Now suppose we show a picture with known content (pictures from Commons with good meta-data), then we could to a degree of trust find pictures with overlapping categories. I am not sure whether this kind of automated reverse meta-data labelling should be done for only one category per time, or if some kind of "category bundles" work better. Probably adjectives and items should be compounded (eg "red flowers").

Relevant articles and links from Wikipedia: # https://en.wikipedia.org/wiki/Image_retrieval # https://en.wikipedia.org/wiki/Content-based_image_retrieval # https://en.wikipedia.org/wiki/List_of_CBIR_engines#CBIR_research_projects.2F...

Best wishes Kristian Kankainen

18.06.2014 09:14, Pine W kirjutas:

...
Machine vision is definitely getting better with time. We have computer-driven airplanes, computer-driven cars, and computer-driven spacecraft. The computers need us less and less as hardware and software improve. I think it may be less than a decade before machine vision is good enough to categorize most objects in photographs.

Pine _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Interesting. Some demo links that I found:

* http://demo-itec.uni-klu.ac.at/liredemo/ * http://image.mdx.ac.uk/time/demo.php * http://mi-file.isti.cnr.it:8765/CophirSearch/ * http://orpheus.ee.duth.gr/anaktisi/ (not free) * http://youtu.be/2eaGwk4Xhks

I suppose one integration pathway would be, you do a normal search, and then from there you can say, find images similar to this search result.

Of course if I do https://commons.wikimedia.org/w/index.php?title=Special:Search&search=bi... , the first result is relavent. But if I plug https://upload.wikimedia.org/wikipedia/commons/thumb/f/f7/2009_windowboxes_B... into http://demo-itec.uni-klu.ac.at/liredemo/ , the results aren't really that relavent.

--bawolff

Nikolas Everett

2:40 a.m.

On Jun 18, 2014 2:28 PM, "Brian Wolff" bawolff@gmail.com wrote:

...

On 6/18/14, Kristian Kankainen kristian@eki.ee wrote:

...
Hello!

I think, if one is clever enough, some categorization could be automated allready.

Searching for pictures based on meta-data is called "Concept Based Image Retrieval", searching based on the machine vision recognized content of the image is called "Content Based Image Retrieval".

What I understood of Lars' request, is an automated way of finding the "superfluous" concepts or meta-data for pictures based on their content. Of course recognizing an images content is very hard (and subjective), but I think it would be possible for many of these "superfluous" categories, such as "winter landscape", "summer beach" and perhaps also "red flowers" and "bicycle".

There exist today many open source "Content Based Image Retrieval" systems, that I understand basically works in the way that you give them a picture, and they find you the "matching" pictures accompanied with a score. Now suppose we show a picture with known content (pictures from Commons with good meta-data), then we could to a degree of trust find pictures with overlapping categories. I am not sure whether this kind of automated reverse meta-data labelling should be done for only one category per time, or if some kind of "category bundles" work better. Probably adjectives and items should be compounded (eg "red flowers").

Relevant articles and links from Wikipedia: # https://en.wikipedia.org/wiki/Image_retrieval # https://en.wikipedia.org/wiki/Content-based_image_retrieval #

https://en.wikipedia.org/wiki/List_of_CBIR_engines#CBIR_research_projects.2F...

...

...
Best wishes Kristian Kankainen

18.06.2014 09:14, Pine W kirjutas:

...
Machine vision is definitely getting better with time. We have computer-driven airplanes, computer-driven cars, and computer-driven spacecraft. The computers need us less and less as hardware and

software

...

...
...
improve. I think it may be less than a decade before machine vision is good enough to categorize most objects in photographs.

Pine _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Interesting. Some demo links that I found:

http://demo-itec.uni-klu.ac.at/liredemo/

Lire has been on my list of things to look at for a while now. Its nice because it could integrate reasonably easily into cirrus because it is built on lucene.

I can't promise anything quick but I'll look into the others as well.

Nik

krinklemail＠gmail.com

12:22 p.m.

On 18 Jun 2014, at 06:14, MZMcBride z@mzmcbride.com wrote:

...

Lars Aronsson wrote:

...
Why is it still, now in 2014, so hard to find images? We have categories and descriptions, but we also know they don't describe all that we want to find in an image. If I need an image with a bicycle and some red flowers, I can only go to the category:bicycles and hope that I'm lucky when browsing through the first 700 images there. Most likely, the category will be subdivided by country or in some other useless way that will make my search harder.

Where is science? Google was created in 1998, based on its Pagerank algorithm for web pages filled with words and links. That was 14 years ago. But what algorithms are there for finding images?

Hi.

Have you tried Special:Search? :-)

There's a very nice category of red flowers: https://commons.wikimedia.org/wiki/Category:Red_flowers.

If you search for 'incategory:"Red flowers"', you can find pictures in only that category. If you search for 'incategory:"Red flowers" incategory:"Bicycles"', you can see the intersection of these two categories. (No results currently, alas.) Try a search such as 'incategory:"Red flowers" incategory:"Cosmos atrosanguineus"' to see the search actually work (it should return one result currently, 'File:Cosmos atrosanguineus "Choco Mocha".jpg').

Hope that helps.

While having category intersection is definitely a huge plus now, for it to work really well we need it to be traversing up and down. Does it do that right now?

Especially because Commons has a policy against over categorisation (which makes sense), and because we subcategorise so insanely much (might not so much sense always but oh well), you really need it to traverse categories recursively to get anything useful.

So that you can search for category "Flowers" or "Red" and still those from "Red flowers".

And similarly so with Bicycles categorised. You want to get those from "Bicycles facing left" or "Bicycles in Vietnam" to be included when looking for "Bicycles".

-- Krinkle

Brian Wolff

1:41 p.m.

...

While having category intersection is definitely a huge plus now, for it to work really well we need it to be traversing up and down. Does it do that right now?

Especially because Commons has a policy against over categorisation (which makes sense), and because we subcategorise so insanely much (might not so much sense always but oh well), you really need it to traverse categories recursively to get anything useful.

So that you can search for category "Flowers" or "Red" and still those from "Red flowers".

And similarly so with Bicycles categorised. You want to get those from "Bicycles facing left" or "Bicycles in Vietnam" to be included when looking for "Bicycles".

-- Krinkle

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Well not in core, but with help from tool labs...

https://commons.wikimedia.org/wiki/Category:Bicycles?fastcci=%7B%22c1%22%3A3...

--bawolff

Magnus Manske

18 Jun 18 Jun

11:13 p.m.

Well, you can use this:

http://tools.wmflabs.org/catscan2/quick_intersection.php?lang=commons&pr...

which will give you one image. It has a bicycle and red flowers.

However, if you push the category depth past 7 you get false positives, because the category tree is horribly broken on Commons.

Cheers, Magnus

On Wed, Jun 18, 2014 at 2:13 AM, Lars Aronsson lars@aronsson.se wrote:

...

Why is it still, now in 2014, so hard to find images? We have categories and descriptions, but we also know they don't describe all that we want to find in an image. If I need an image with a bicycle and some red flowers, I can only go to the category:bicycles and hope that I'm lucky when browsing through the first 700 images there. Most likely, the category will be subdivided by country or in some other useless way that will make my search harder.

Where is science? Google was created in 1998, based on its Pagerank algorithm for web pages filled with words and links. That was 14 years ago. But what algorithms are there for finding images?

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- undefined

3800

Age (days ago)

3801

Last active (days ago)

wikitech-l@lists.wikimedia.org

20 comments

12 participants

tags (0)

participants (12)

Brian Wolff
Federico Leva (Nemo)
Gerard Meijssen
krinklemail＠gmail.com
Kristian Kankainen
Lars Aronsson
Magnus Manske
MZMcBride
Nikolas Everett
Pine W
Rayson Ho
Tim Starling