Hi all,
My friend Prof. Wu and his students developed Wikigazer ( http://wil.csie.cyut.edu.tw/Wikigazer.php?hl=en ), a cross-lingual Wikipedia search engine based on Lucene.
I would like to know if you think it's useful or not, and is it fast enough from your location.
Thank you!
Best Regards, /Mike/
On 28/05/07, Tian-Jian Barabbas Jiang@Gmail barabbas@gmail.com wrote:
Hi all,
My friend Prof. Wu and his students developed Wikigazer ( http://wil.csie.cyut.edu.tw/Wikigazer.php?hl=en ), a cross-lingual Wikipedia search engine based on Lucene.
I would like to know if you think it's useful or not, and is it fast enough from your location.
It's hard to tell if it's useful because it's hard to tell what the purpose of it is. Put search terms in one language, get search results in another? Or something else? What is supposed to be the difference between 'single language' and 'cross language'?
Also, it is not currently aware of redirects and this produces multiple results for one page, search for 'HSK' to see an example.
Also does it use interwiki links to determine the 'cross language' links? It seems like it doesn't... it should! It would be able to produce better links in many more languages than five or so.
cheers Brianna
Hi Brianna,
Brianna Laugher wrote:
It's hard to tell if it's useful because it's hard to tell what the purpose of it is. Put search terms in one language, get search results in another? Or something else? What is supposed to be the difference between 'single language' and 'cross language'?
Single language is just like what Wikipedia has already done, but only in a different interface.
For cross-lingual search use case, basically yes, search terms in one language and get results in another. Obviously it's not as good as Google's new experiment, http://translate.google.com/translate_s , but let me introduce a possible scenario:
Translated books in different area may have different convention of terms. In Taiwan, for example, we usually have a translated term in Traditional Chinese followed by the original term in the bucket. In China, however, translated terms in Simplified Chinese usually lack of original terms and indexes. To someone who is reading Simplified Chinese articles, it may be useful to search in Simplified Chinese and then get Wikipedia results in English, for disambiguating the term.
It's just a college student project and I want to extend it. That's why I'm asking for more possible use cases that could be helpful.
Also, it is not currently aware of redirects and this produces multiple results for one page, search for 'HSK' to see an example.
I see, it should be considered to mark.
Also does it use interwiki links to determine the 'cross language' links? It seems like it doesn't... it should! It would be able to produce better links in many more languages than five or so.
I can only guess since I didn't have the source code yet (expected to have in a week). According to my friend, the adviser of this system, it did use interwiki links but seems not enough, so they decided to parse all contents in five languages for a demo purpose.
Thank you for your precious advises!
Sincerely, /Mike/
Hi all,
Brianna Laugher wrote:
It's hard to tell if it's useful because it's hard to tell what the purpose of it is.
BTW, I recalled one interesting thing. I had discussed cross-lingual information retrieval with the chief of Google Taiwan Institute, Prof. Lee-Feng Chien. He also thinks it's useless to end users unless we treat it as actually "monolingual" information retrieval -- just like the Simplified -> English case I provided in previous post. It is monolingual since we actually treat original terms as some kind of "named entities"; they had already translated in our mind. It's interesting, though, Google announced this kind of service. I'm wondering if it will be useful, too.
Cheers, /Mike/
Hi all,
My friend Prof. Wu and his students developed Wikigazer ( http://wil.csie.cyut.edu.tw/Wikigazer.php?hl=en ), a cross-lingual Wikipedia search engine based on Lucene.
I would like to know if you think it's useful or not
I searched on "bus routes of Beijing", cross-language from English to Chinese : http://wil.csie.cyut.edu.tw/cross_result.php?pagestart=0&q=bus+routes+of... ... there are 14 results, and if I mouseover the Cross Language / interwiki links, they show me roughly what the page is about (e.g. first result is about "Beijing Municipal Administration and Communications Card" and seventh is about "Transportation in Beijing" ). Would it be possible to have those interwiki names prominently displayed if they match the search language? I.e. If the first line said "Beijing Municipal Administration and Communications Card" next to the title, then as a non-Chinese speaker who searched in English, it would be much more obvious from the search results what the page was about.
Also, there's an XSS on the "hl" field: http://wil.csie.cyut.edu.tw/cross_result.php?pagestart=0&q=bus+routes+of...<script>alert(666);</script><div
is it fast enough from your location.
Up to 4 seconds for some searches, most searches took around 2 seconds.
Slowest was 168 seconds for the word "test", returns no results: http://wil.csie.cyut.edu.tw/cross_result.php?pagestart=0&q=test&lang...
-- All the best, Nick.
Hi Nick,
2007/5/29, Nick Jenkins nickpj@gmail.com:
I searched on "bus routes of Beijing", cross-language from English to Chinese :
http://wil.csie.cyut.edu.tw/cross_result.php?pagestart=0&q=bus+routes+of... ... there are 14 results, and if I mouseover the Cross Language / interwiki links, they show me roughly what the page is about (e.g. first result is about "Beijing Municipal Administration and Communications Card" and seventh is about "Transportation in Beijing" ). Would it be possible to have those interwiki names prominently displayed if they match the search language? I.e. If the first line said "Beijing Municipal Administration and Communications Card" next to the title, then as a non-Chinese speaker who searched in English, it would be much more obvious from the search results what the page was about.
Yes, it should be. The adviser of this system is aware of this issue, but his students need time to polish it. :)
Also, there's an XSS on the "hl" field:
http://wil.csie.cyut.edu.tw/cross_result.php?pagestart=0&q=bus+routes+of...
<script>alert(666);</script><div
Wow. Thank you for pointing this out. I will inform the system maintainer to fix it.
is it fast enough from your location.
Up to 4 seconds for some searches, most searches took around 2 seconds.
Slowest was 168 seconds for the word "test", returns no results:
http://wil.csie.cyut.edu.tw/cross_result.php?pagestart=0&q=test&lang...
Thank you again for your kindness report.
Sincerely, /Mike/
wikitech-l@lists.wikimedia.org