PanImages Image Search Tool Speaks Hundreds Of Languages http://www.lockergnome.com/nexus/news/2007/09/12/panimages-image-search-tool...
quote: PanImages' powerful brains were created by scanning more than 350 machine-readable online dictionaries. Some of these were "wiktionaries," online multilingual dictionaries written by volunteers. The PanImages software scans these dictionaries and uses an algorithm to check the accuracy of the results. It then assembles the results in a matrix that allows translation in combinations that may never have been attempted — for instance, from Gujarati to Lithuanian.
The actual search engine is here: http://www.panimages.org/index.jsp?displang=eng
And the research paper detailing the algorithm and method is here: http://turing.cs.washington.edu/papers/EtzioniMTSummit07.pdf
An idea to use Wiktionary or interwiki links to improve image search for Commons has long been kicked around. Maybe we could collaborate with them to improve the Mayflower search engine for Commons? (Or else ask them to index upload.wikimedia.org and pay attention to license metadata :)) After all, we supplied them with all this useful data for free.
cheers, Brianna
I think this is a really good use of cross-lingual retrieval technology, because images are language neutral. If the system returns an image of a tree found on an arabic site, I can still understand the image and tell that it's a tree. In contrast, if I search for *text* about trees and the system returns an arabic page on trees, I can't make use of it unless I can at least read arabic.
One comment about the user interface of your prototype. It's too focused on the query words, and not enough on the images. Right now, to find images, I have to:
A) Type a query (e.g. "tree") B) Tell the system my query is in English C) Select the meaning of the word I mean (e.g. "a large wooden plant"). D) Select the language I want to search in (e.g. arabic)
Only after I have done these things do I get to finally see images.
I would suggest the following changes:
Have the query language set to English by default (I'm French, so I'm allowed to say that ;-), and have a pick list where people can change that (and store that preference in a cookie).
Skip C) and D) altogether, and search for images that match all meanings and all translations in all languages. Maybe that doesn't work well and produces too many false matches?
---- Alain Désilets, National Research Council of Canada Chair, WikiSym 2007
2007 International Symposium on Wikis Wikis at Work in the World: Open, Organic, Participatory Media for the 21st Century
http://www.wikisym.org/ws2007/
-----Original Message----- From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Brianna Laugher Sent: September 12, 2007 10:22 PM To: commons-l; wiktionary-l@lists.wikimedia.org; Wikimedia Foundation Mailing List; wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] PanImages: cross-lingual image search createdusing 'Wiktionaries'
PanImages Image Search Tool Speaks Hundreds Of Languages http://www.lockergnome.com/nexus/news/2007/09/12/panimages-ima ge-search-tool-speaks-hundreds-of-languages/
quote: PanImages' powerful brains were created by scanning more than 350 machine-readable online dictionaries. Some of these were "wiktionaries," online multilingual dictionaries written by volunteers. The PanImages software scans these dictionaries and uses an algorithm to check the accuracy of the results. It then assembles the results in a matrix that allows translation in combinations that may never have been attempted - for instance, from Gujarati to Lithuanian.
The actual search engine is here: http://www.panimages.org/index.jsp?displang=eng
And the research paper detailing the algorithm and method is here: http://turing.cs.washington.edu/papers/EtzioniMTSummit07.pdf
An idea to use Wiktionary or interwiki links to improve image search for Commons has long been kicked around. Maybe we could collaborate with them to improve the Mayflower search engine for Commons? (Or else ask them to index upload.wikimedia.org and pay attention to license metadata :)) After all, we supplied them with all this useful data for free.
cheers, Brianna
-- They've just been waiting in a mountain for the right moment: http://modernthings.org/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On 9/13/07, Desilets, Alain Alain.Desilets@nrc-cnrc.gc.ca wrote:
I think this is a really good use of cross-lingual retrieval technology, because images are language neutral. If the system returns an image of a tree found on an arabic site, I can still understand the image and tell that it's a tree. In contrast, if I search for *text* about trees and the system returns an arabic page on trees, I can't make use of it unless I can at least read arabic.
One comment about the user interface of your prototype. It's too focused on the query words, and not enough on the images. Right now, to find images, I have to:
A) Type a query (e.g. "tree") B) Tell the system my query is in English C) Select the meaning of the word I mean (e.g. "a large wooden plant"). D) Select the language I want to search in (e.g. arabic)
Only after I have done these things do I get to finally see images.
I would suggest the following changes:
Have the query language set to English by default (I'm French, so I'm allowed to say that ;-), and have a pick list where people can change that (and store that preference in a cookie).
Skip C) and D) altogether, and search for images that match all meanings and all translations in all languages. Maybe that doesn't work well and produces too many false matches?
Leave the "meaning" step (optional), but searching for all languages at once makes a lot of sense. At least Google can handle it :-)
Magnus
On 13/09/2007, Desilets, Alain Alain.Desilets@nrc-cnrc.gc.ca wrote:
One comment about the user interface of your prototype. It's too focused on the query words, and not enough on the images.
It's not my prototype, but I agree, the interface is very clunky, in English at least. They could streamline it a lot more. Make reasonable assumptions (like: person using English interface inputs English words), but let people "undo" those assumptions if they want.
cheers Brianna
On 9/14/07, Brianna Laugher brianna.laugher@gmail.com wrote:
On 13/09/2007, Desilets, Alain Alain.Desilets@nrc-cnrc.gc.ca wrote:
One comment about the user interface of your prototype. It's too focused on the query words, and not enough on the images.
It's not my prototype, but I agree, the interface is very clunky, in English at least. They could streamline it a lot more. Make reasonable assumptions (like: person using English interface inputs English words), but let people "undo" those assumptions if they want.
OK, my usual cheap hack: http://tools.wikimedia.de/~magnus/cigs.php Splits your search query into words, checks wikipedia language links for that word, constructs a google image search from it.
No autodetect for language (too lazy). Only uses the four (politically incorrectly labeled) languages, which was necessary for google search not to blow up on 50 different words, times two or three...
Also, you have to write "title style". Try German, "Hund schwarz" (nor "schwarzer Hund", as you would write natively), an you will actually get images of black dogs on the wikis.
Google word grouping sometimes doesn't work as expected, though. Does Yahoo work any better?
Magnus
OK, for some reason I thought this was your work.
Alain
-----Original Message----- From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Brianna Laugher Sent: September 13, 2007 8:53 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] PanImages: cross-lingual image searchcreatedusing 'Wiktionaries'
On 13/09/2007, Desilets, Alain Alain.Desilets@nrc-cnrc.gc.ca wrote:
One comment about the user interface of your prototype.
It's too focused on the query words, and not enough on the images.
It's not my prototype, but I agree, the interface is very clunky, in English at least. They could streamline it a lot more. Make reasonable assumptions (like: person using English interface inputs English words), but let people "undo" those assumptions if they want.
cheers Brianna
-- They've just been waiting in a mountain for the right moment: http://modernthings.org/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org