Hi Julien,
One more patch for the suggestion searching, this time to add experimental "autocomplete" searching.
It works like this: * If there is only one matching result, then the rest of the search term is automatically filed in. * If the top two searches agree on the next X letters, then the next X letters are automatically filled in.
In both cases, any automatically filled in text is kept highlighted / selected. This way, the user can either: * Press the right arrow key if the autocomplete is correct (thus accepting the autocomplete). * Or keep typing normally, thus overwriting the highlighted autocomplete.
In other words, you shouldn't in theory lose anything from the autocomplete, but you may well gain (in that we can be lazier because of it). For example, you can specify the word "australia", and instead of the usual 9 key presses, you can do it with 4 ("au", right arrow, "a"); "Los Angeles Aqueduct" instead of the usual 20 key presses is just 9 ("los a" + right arrow + " aq"); and so forth.
Patch (against 0.2) is here: http://files.nickj.org/MediaWiki/WSuggest-js-autocomplete-diff.txt
Currently there are two known problems with the autocomplete functionality (hence the 'experimental' status): * The main one is if the user types fast, the highlighting/selection will stuff up. For example, if you type "new" quickly, then it autocompletes to "new York" (which is fine), but the " York" bit is not highlighted (which is definitely not fine). I'm currently not sure why this happens (maybe a race condition between the sequential query requests?) - but if anyone knows how to prevent this, please sing out and/or send patches! * Gets redirects wrong (fills in too much detail). For example if the user types "formula we" it will autocomplete to "Formula Weight → Atomic mass", which simply won't work. It's possible to have the JavaScript search for " → " and stop any string at that point, but it feels a bit ugly, so I have omitted that. Might it be slightly neater to have sendRes return 5 fields instead of 4? E.g. "sendRes(query, res, freqs, urls, redirectTargets)", where the new "redirectTargets" field would be an array that contained a NULL/empty entry when there was no redirect, or the redirect target when it was a redirect? That way 'res' could contain just the article's name, and the redirect target functionality could be kept in a logically separate field.
All the best, Nick.
Hi Julien,
Might it be slightly neater to have sendRes return 5 fields instead of 4? E.g. "sendRes(query, res, freqs, urls, redirectTargets)", where the new "redirectTargets" field would be an array that contained a NULL/empty entry when there was no redirect, or the redirect target when it was a redirect? That way 'res' could contain just the article's name, and the redirect target functionality could be kept in a separate field.
Actually, even better would be to: * change to JSON format ( http://www.json.org/ ) * omit the "http://en.wikipedia.org/wiki/" bit (and let the web UI add this instead) * use a different array for each element, rather than 3 different arrays for the different fields (thus grouping all the fields for one element together). * use the [] notation instead of array notation (which is part of the JSON spec anyway).
In other words, using a real example, if the TcpQuery results for a search on "cat" were changed from this current result:
sendRes("cat", new Array("Catholics → Roman Catholic Church", "Catholic Archibishop → Bishop", "Catholic", "Catholic Pope → Pope", "CATV → Cable television", "Catalogue astrographique → Star catalogue", "Catholic Encyclopedia", "Catalonia", "Cattle", "Catholicism"), new Array("7505", "4484", "4200", "3269", "2347", "2095", "1956", "1740", "1604", "1527"), new Array("http://en.wikipedia.org/wiki/Roman_Catholic_Church", "http://en.wikipedia.org/wiki/Bishop", "http://en.wikipedia.org/wiki/Catholic", "http://en.wikipedia.org/wiki/Pope", "http://en.wikipedia.org/wiki/Cable_television", "http://en.wikipedia.org/wiki/Star_catalogue", "http://en.wikipedia.org/wiki/Catholic_Encyclopedia", "http://en.wikipedia.org/wiki/Catalonia", "http://en.wikipedia.org/wiki/Cattle", "http://en.wikipedia.org/wiki/Catholicism"));
To this:
["cat", ["Catholics", 7505, "Roman Catholic Church"], ["Catholic Archibishop", 4484, "Bishop"], ["Catholic", 4200, ], ["Catholic", 3269, ], ["CATV", 2347, "Cable television"], ["Catalogue astrographique", 2095, "Star catalogue"], ["Catholic Encyclopedia", 1956, ], ["Catalonia", 1740, ], ["Cattle", 1604, ], ["Catholicism", 1527, ] ]
Then it would have the following benefits: * Less data is transmitted (by my count approximately 840 bytes versus 322 bytes, or a saving on 62%). * The data can still be passed to an JS eval() to evaluate it. * The redirects are in a different field, instead of merged into the first field. * It's in a standard data format, so if other services wanted to talk to TcpQuery, this would make that easier. * To me, the second of those data formats seems simpler (but this may just be my personal preference).
All the best, Nick.
* Nick Jenkins nickpj@gmail.com [2006-08-11 16:19:18 +1000]:
Actually, even better would be to:
- change to JSON format ( http://www.json.org/ )
- omit the "http://en.wikipedia.org/wiki/" bit (and let the web UI add this instead)
- use a different array for each element, rather than 3 different arrays for the different fields (thus grouping all the fields for
one element together).
- use the [] notation instead of array notation (which is part of the JSON spec anyway).
In other words, using a real example, if the TcpQuery results for a search on "cat" were changed from this current result:
sendRes("cat", new Array("Catholics → Roman Catholic Church", "Catholic Archibishop → Bishop", "Catholic", "Catholic Pope → Pope", "CATV → Cable television", "Catalogue astrographique → Star catalogue", "Catholic Encyclopedia", "Catalonia", "Cattle", "Catholicism"), new Array("7505", "4484", "4200", "3269", "2347", "2095", "1956", "1740", "1604", "1527"), new Array("http://en.wikipedia.org/wiki/Roman_Catholic_Church", "http://en.wikipedia.org/wiki/Bishop", "http://en.wikipedia.org/wiki/Catholic", "http://en.wikipedia.org/wiki/Pope", "http://en.wikipedia.org/wiki/Cable_television", "http://en.wikipedia.org/wiki/Star_catalogue", "http://en.wikipedia.org/wiki/Catholic_Encyclopedia", "http://en.wikipedia.org/wiki/Catalonia", "http://en.wikipedia.org/wiki/Cattle", "http://en.wikipedia.org/wiki/Catholicism"));
To this:
["cat", ["Catholics", 7505, "Roman Catholic Church"], ["Catholic Archibishop", 4484, "Bishop"], ["Catholic", 4200, ], ["Catholic", 3269, ], ["CATV", 2347, "Cable television"], ["Catalogue astrographique", 2095, "Star catalogue"], ["Catholic Encyclopedia", 1956, ], ["Catalonia", 1740, ], ["Cattle", 1604, ], ["Catholicism", 1527, ] ]
Then it would have the following benefits:
- Less data is transmitted (by my count approximately 840 bytes versus 322 bytes, or a saving on 62%).
- The data can still be passed to an JS eval() to evaluate it.
- The redirects are in a different field, instead of merged into the first field.
- It's in a standard data format, so if other services wanted to talk to TcpQuery, this would make that easier.
- To me, the second of those data formats seems simpler (but this may just be my personal preference).
I am agree to use this format instead of the previous one, it is better. But the url need to be added since it is different of the title(but without the http://en.wikipedia.org/wiki/).
I will give modifiy the TcpQuery to give result in json with an array of 4 entry : [url, name, frequency, redirect] for example : ["cat", ["Roman_Catholic_Church", "Catholics", 7505, "Roman Catholic Church"], ...
Did you used json in EMCAsript/javascript ?
Best Regards. Julien Lemoine
But the url need to be added since it is different of the title
Yep, but you can work out the url from the title: ------------------------------ function titleToUrl(title) { var chr, url = ""; for (var i=0; i<title.length; i++) { chr = title.charCodeAt(i); url += (chr == 32 ? "_" : escape(String.fromCharCode(chr))); } return url; }
// quick test: var test_data = ["Roman Catholic Church", "cat (disambig)", ""!@$^&*))(_--{}"]; for (var i=0; i<test_data.length; i++) { document.write(test_data[i] + " equals: " + titleToUrl(test_data[i]) + " <br>\n"); } ------------------------------
Output is: ------------------------------ Roman Catholic Church equals: Roman_Catholic_Church <br> cat (disambig) equals: cat_%28disambig%29 <br> "!@$^&*))(_--{} equals: %22%21@%24%5E%26*%29%29%28_--%7B%7D <br> ------------------------------ (which seems identical to what the Wikipedia gives too).
Don't have to do it this way though, and if you'd prefer to do it on the server side, then do that.
I just thought that transmitting less data and potentially storing less data might help.
Did you used json in EMCAsript/javascript ?
Nah, I just make this stuff up as I go along. ;-)
Should work fine though: ------------------------------ var json_data = eval("["cat",["Catholics", 7505, "Roman Catholic Church"],["Catholic Archibishop", 4484, "Bishop"],["Catholic", 4200, ],["Catholic", 3269, ]["CATV", 2347, "Cable television"],["Catalogue astrographique", 2095, "Star catalogue"],["Catholic Encyclopedia", 1956, ],["Catalonia", 1740, ],["Cattle", 1604, ],["Catholicism", 1527, ]]"); alert("length: " + json_data.length + " data: " + json_data); ------------------------------
I.e. you may have to get rid of the newlines in the data stream.
All the best, Nick.
Hello,
I transformed the TcpQuery to send output in json (without the url) and I added Nick Jenkins great autocomplete feature. Results for english and french are available at http://suggest.speedblue.org/ (includes the heuristic to choose the correct redirection and the handle of articles with different capitalization).
The format of fsa.bin and articles.bin is now a little bit different, so you need to redownload them from : English : http://www2.speedblue.org/download/WikipediaSuggestCompiledEN.tar.bz2 French : http://www2.speedblue.org/download/WikipediaSuggestCompiledFR.tar.bz2
The latest version of the sources with all these modifications is available at : http://suggest.speedblue.org/tgz/wikipedia-suggest-0.31.tar.gz
I hope you will enjoy all these modifications and I am open to any kind of suggestion/modification.
I have done some benchmark of TcpQuery with MemoryQuery backend and 5 threads on my computer (Pentium D930). I used 10 threads to simulate queries. I handled 154000 random queries in 24.7 seconds with CPU usage of 100% (about 6234 queries per second).
I plan to wrote an analyzer more dedicated to Wikipedia, but I do not know now to get titles/redirections/links for the moment. Do you know how to get the target of redirections in the sql database ? Do you think taking pages-articles.xml.bz2 and update index every month is acceptable ?
Best Regards. Julien Lemoine
Nick Jenkins wrote:
But the url need to be added since it is different of the title
Yep, but you can work out the url from the title:
function titleToUrl(title) { var chr, url = ""; for (var i=0; i<title.length; i++) { chr = title.charCodeAt(i); url += (chr == 32 ? "_" : escape(String.fromCharCode(chr))); } return url; }
// quick test: var test_data = ["Roman Catholic Church", "cat (disambig)", ""!@$^&*))(_--{}"]; for (var i=0; i<test_data.length; i++) { document.write(test_data[i] + " equals: " + titleToUrl(test_data[i]) + " <br>\n"); }
Output is:
Roman Catholic Church equals: Roman_Catholic_Church <br> cat (disambig) equals: cat_%28disambig%29 <br> "!@$^&*))(_--{} equals: %22%21@%24%5E%26*%29%29%28_--%7B%7D <br>
(which seems identical to what the Wikipedia gives too).
Don't have to do it this way though, and if you'd prefer to do it on the server side, then do that.
I just thought that transmitting less data and potentially storing less data might help.
Did you used json in EMCAsript/javascript ?
Nah, I just make this stuff up as I go along. ;-)
Should work fine though:
var json_data = eval("["cat",["Catholics", 7505, "Roman Catholic Church"],["Catholic Archibishop", 4484, "Bishop"],["Catholic", 4200, ],["Catholic", 3269, ]["CATV", 2347, "Cable television"],["Catalogue astrographique", 2095, "Star catalogue"],["Catholic Encyclopedia", 1956, ],["Catalonia", 1740, ],["Cattle", 1604, ],["Catholicism", 1527, ]]"); alert("length: " + json_data.length + " data: " + json_data);
I.e. you may have to get rid of the newlines in the data stream.
All the best, Nick.
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
-----Original Message----- From: wikitech-l-bounces@wikimedia.org [mailto:wikitech-l- bounces@wikimedia.org] On Behalf Of Julien Lemoine Sent: Friday, August 11, 2006 3:18 PM To: Wikimedia developers Subject: Re: [Wikitech-l] Wikipedia Suggest
...
Do you know how to get the target of redirections in the sql database ?
...
Julien Lemoine
Julien, I've been working on a project that's required me to both resolve links and get the target of redirections for the whole summer. Let me see if I can combine the SQL commands I've found, and come up with one to give you the target of all redirects in the database...
Here we go:
SELECT from_page.page_title AS from_title, to_page.page_title AS to_title FROM page from_page INNER JOIN pagelinks pl ON pl.pl_from = from_page.page_id INNER JOIN page to_page ON to_page.page_namespace = pl.pl_namespace AND to_page.page_title = pl.pl_title WHERE from_page.page_is_redirect=1;
I hope that helps!
Sincerely, Eric Astor
P.S. My original SQL commands follow.
To resolve links:
CREATE TABLE resolvedlinks ( rl_from INT(8) UNSIGNED NOT NULL, rl_to INT(8) UNSIGNED NOT NULL, PRIMARY KEY (rl_from, rl_to) ) SELECT from_page.page_id AS rl_from, to_page.page_id AS rl_to FROM pagelinks INNER JOIN page from_page ON from_page.page_id=pl_from INNER JOIN page to_page ON to_page.page_namespace=pl_namespace AND to_page.page_title=CONCAT(UPPER(SUBSTRING(pl_title,1,1)),SUBSTRING(pl_title, 2));
To re-resolve the links to their actual targets (eliminating one level of redirects):
DROP TEMPORARY TABLE IF EXISTS dl; CREATE TEMPORARY TABLE dl ( dl_from INT(8) UNSIGNED NOT NULL, dl_to INT(8) UNSIGNED NOT NULL ) SELECT rl_from AS dl_from, rl_to AS dl_to FROM resolvedlinks; UPDATE dl INNER JOIN page ON page.page_id=dl.dl_to INNER JOIN resolvedlinks links ON links.rl_from=dl.dl_to SET dl.dl_to=links.rl_to WHERE page.page_is_redirect=1; DROP TABLE IF EXISTS directlinks; CREATE TABLE directlinks ( dl_from INT(8) UNSIGNED NOT NULL, dl_to INT(8) UNSIGNED NOT NULL, PRIMARY KEY (dl_from, dl_to) ) SELECT DISTINCT dl_from, dl_to FROM dl;
Eric Astor wrote:
Here we go:
SELECT from_page.page_title AS from_title, to_page.page_title AS to_title FROM page from_page INNER JOIN pagelinks pl ON pl.pl_from = from_page.page_id INNER JOIN page to_page ON to_page.page_namespace = pl.pl_namespace AND to_page.page_title = pl.pl_title WHERE from_page.page_is_redirect=1;
I hope that helps!
Ah, that's a _much_ better query. Nice one!
Definitely disregard what I wrote about using a regex, and use this instead.
All the best, Nick.
Thank you Eric for your SQL queries. I will try them.
Best Regards. Julien Lemoine.
Eric Astor wrote:
links and get the target of redirections for the whole summer. Let me see if I can combine the SQL commands I've found, and come up with one to give you the target of all redirects in the database...
Here we go:
SELECT from_page.page_title AS from_title, to_page.page_title AS to_title FROM page from_page INNER JOIN pagelinks pl ON pl.pl_from = from_page.page_id INNER JOIN page to_page ON to_page.page_namespace = pl.pl_namespace AND to_page.page_title = pl.pl_title WHERE from_page.page_is_redirect=1;
I hope that helps!
Sincerely, Eric Astor
P.S. My original SQL commands follow.
To resolve links:
CREATE TABLE resolvedlinks ( rl_from INT(8) UNSIGNED NOT NULL, rl_to INT(8) UNSIGNED NOT NULL, PRIMARY KEY (rl_from, rl_to) ) SELECT from_page.page_id AS rl_from, to_page.page_id AS rl_to FROM pagelinks INNER JOIN page from_page ON from_page.page_id=pl_from INNER JOIN page to_page ON to_page.page_namespace=pl_namespace AND to_page.page_title=CONCAT(UPPER(SUBSTRING(pl_title,1,1)),SUBSTRING(pl_title, 2));
To re-resolve the links to their actual targets (eliminating one level of redirects):
DROP TEMPORARY TABLE IF EXISTS dl; CREATE TEMPORARY TABLE dl ( dl_from INT(8) UNSIGNED NOT NULL, dl_to INT(8) UNSIGNED NOT NULL ) SELECT rl_from AS dl_from, rl_to AS dl_to FROM resolvedlinks; UPDATE dl INNER JOIN page ON page.page_id=dl.dl_to INNER JOIN resolvedlinks links ON links.rl_from=dl.dl_to SET dl.dl_to=links.rl_to WHERE page.page_is_redirect=1; DROP TABLE IF EXISTS directlinks; CREATE TABLE directlinks ( dl_from INT(8) UNSIGNED NOT NULL, dl_to INT(8) UNSIGNED NOT NULL, PRIMARY KEY (dl_from, dl_to) ) SELECT DISTINCT dl_from, dl_to FROM dl;
Julien, I've been working on a project that's required me to both resolve
Hi Julien,
The latest version of the sources with all these modifications is available at : http://suggest.speedblue.org/tgz/wikipedia-suggest-0.31.tar.gz
Some search suggestion updates:
* Got rid of the race condition in autocomplete when user types fast (e.g. can type "new" fast without having it lose the highlighting on " York"). * Removed the first field from the JSON response from TcpQuery (basically client was asking a question, and getting a response from server that said "the question you asked is this, and the answer is this" - however we only need the answer back, not the question). Also removed a handful of spaces that aren't needed in the JSON data. * Pressing "Enter" on a response which has no matches will now invoke MediaWiki's "Special:Search" on that query. * Moved the URL to the wiki to index.php (as "var baseUrl"), which hopefully means that a single WSuggest.js file can be used for both English and French. * Added a CHANGELOG file to keep track of what's changed + release dates. * With the "var autocomplete" + delete key stuff (intKey == 8), I think the intent was to update the suggestion list (but not the autocomplete) if the user presses the backspace/delete keys. Added a 'showAutocomplete' boolean parameter for this. * If the user searched for "XYZ Affair", then deleted the "Y" to give "XZ Affair", it would still show "XYZ Affair" as the only matching result (when really there are no matching results). To prevent this removed the 'cache' boolean var.
Patch against 0.31 is at http://files.nickj.org/MediaWiki/wikipedia-suggest-0.31-diff.txt
All the best, Nick.
Hi Nick,
Nick Jenkins wrote:
Hi Julien,
The latest version of the sources with all these modifications is available at : http://suggest.speedblue.org/tgz/wikipedia-suggest-0.31.tar.gz
Some search suggestion updates:
- Got rid of the race condition in autocomplete when user types fast (e.g. can type "new" fast without having it lose the
highlighting on " York").
- Removed the first field from the JSON response from TcpQuery (basically client was asking a question, and getting a response from
server that said "the question you asked is this, and the answer is this" - however we only need the answer back, not the question). Also removed a handful of spaces that aren't needed in the JSON data.
- Pressing "Enter" on a response which has no matches will now invoke MediaWiki's "Special:Search" on that query.
- Moved the URL to the wiki to index.php (as "var baseUrl"), which hopefully means that a single WSuggest.js file can be used for
both English and French.
- Added a CHANGELOG file to keep track of what's changed + release dates.
- With the "var autocomplete" + delete key stuff (intKey == 8), I think the intent was to update the suggestion list (but not the
autocomplete) if the user presses the backspace/delete keys. Added a 'showAutocomplete' boolean parameter for this.
- If the user searched for "XYZ Affair", then deleted the "Y" to give "XZ Affair", it would still show "XYZ Affair" as the only
matching result (when really there are no matching results). To prevent this removed the 'cache' boolean var.
Patch against 0.31 is at http://files.nickj.org/MediaWiki/wikipedia-suggest-0.31-diff.txt
Very good job, I will integrate your patch in the 0.4 version that I will release tomorrow. Thank you very much.
Best Regards. Julien
Hello,
I totally rewrote the compiler to works with the XML format of download.wikipedia.org. You will need to download two files from download.wikipedia.org to use it: xxx-all-titles-in-ns0 and xxx-pages-articles.xml. After giving this two files as arguments of Analyzer, you will have the compiled output for query tools. With this modification, you will be able to use wikipedia-suggest on any project using MediaWiki. I downloaded the last dump and updated my website for the ten most popular languages of Wikipedia.
I released these modifications as wikipedia-suggest-0.4, here is the change log : == Version 0.4, released 15-Aug-2006 ==
* Rewrote Analyzer (trie compiler) in order to read files from download.wikipedia.org instead of home made xml format. You can now easily use wikipedia-suggest for project that use MediaWiki. * Replaced libicu by glib to improve performance (glib is really faster for utf-8 case conversions). * Included Last contribution of Nick Jenckins : http://files.nickj.org/MediaWiki/wikipedia-suggest-0.31-diff.txt * Replaced escape by encodeURI because escape use %uXXX synthax which is not understood by MediaWiki (for exemple %u044 and MediaWiki wait %C5%85)
I hope you will find these modifications useful. Best Regards. Julien Lemoine
On 8/15/06, Julien Lemoine speedblue@happycoders.org wrote:
With this modification, you will be able to use wikipedia-suggest on any project using MediaWiki. I downloaded the last dump and updated my website for the ten most popular languages of Wikipedia.
This is nice, but I want it integrated into Wikipedia. How do we do that?
Steve
Hi Steve,
Steve Bennett wrote:
On 8/15/06, Julien Lemoine speedblue@happycoders.org wrote:
With this modification, you will be able to use wikipedia-suggest on any project using MediaWiki. I downloaded the last dump and updated my website for the ten most popular languages of Wikipedia.
This is nice, but I want it integrated into Wikipedia. How do we do that?
For me, you can integrate it now and update the index (trie) after each xml dumps. When the compiler using a sql database will be ready you will be able to replace the update process with the new analyzer.
If you need information about how to install it, don't hesitate to ask me.
Best Regards. Julien Lemoine
Hi Julien,
I was wondering if it had an extension to keep its index up to date - an "OnArticleSave" hook. With that a really useful place to hook it in would not be just the search, but also the edit box. Little bit of js that checks for the user inputting [[, and then the suggest code kicks in with possible article matches. It would make article linking a lot easier.
One problem with it is I don't know how to work out where the text cursor is in a textarea from JS, but I guess you could just have it pop up somewhere, ready for people to click. Obviously to make it all work seemlessly the index would need to be up to date...
Best regards,
Alex Powell
On 8/16/06, Julien Lemoine speedblue@happycoders.org wrote:
Hi Steve,
Steve Bennett wrote:
On 8/15/06, Julien Lemoine speedblue@happycoders.org wrote:
With this modification, you will be able to use wikipedia-suggest on any project using MediaWiki. I downloaded the last dump and updated my website for the ten most popular languages of Wikipedia.
This is nice, but I want it integrated into Wikipedia. How do we do that?
For me, you can integrate it now and update the index (trie) after each xml dumps. When the compiler using a sql database will be ready you will be able to replace the update process with the new analyzer.
If you need information about how to install it, don't hesitate to ask me.
Best Regards. Julien Lemoine
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Hi Alex,
There is no extension to have an index up to date at any moment in wikipedia-suggest. This is possible but that would use a lot of memory because the compilation step that reduce the size is not possible if you are modifying the index. The best way to have an index up to date at any the time will be to have a distributed trie in order to reduce the memory used per computer.
Your idea to use the suggest during articles update is very good. Do you think a daily update of the index (with the sql database) will be enough ?
Best Regards. Julien
Alex Powell wrote:
Hi Julien,
I was wondering if it had an extension to keep its index up to date - an "OnArticleSave" hook. With that a really useful place to hook it in would not be just the search, but also the edit box. Little bit of js that checks for the user inputting [[, and then the suggest code kicks in with possible article matches. It would make article linking a lot easier.
One problem with it is I don't know how to work out where the text cursor is in a textarea from JS, but I guess you could just have it pop up somewhere, ready for people to click. Obviously to make it all work seemlessly the index would need to be up to date...
Best regards,
Alex Powell
On 8/16/06, Julien Lemoine speedblue@happycoders.org wrote:
Hi Steve,
Steve Bennett wrote:
On 8/15/06, Julien Lemoine speedblue@happycoders.org wrote:
With this modification, you will be able to use wikipedia-suggest on any project using MediaWiki. I downloaded the last dump and updated my website for the ten most popular languages of Wikipedia.
This is nice, but I want it integrated into Wikipedia. How do we do that?
For me, you can integrate it now and update the index (trie) after each xml dumps. When the compiler using a sql database will be ready you will be able to replace the update process with the new analyzer.
If you need information about how to install it, don't hesitate to ask me.
Best Regards. Julien Lemoine
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 8/16/06, Alex Powell alexp700@gmail.com wrote:
I was wondering if it had an extension to keep its index up to date - an "OnArticleSave" hook. With that a really useful place to hook it in
That would be cool. Only three cases would need to be treated: - brand new article: update the index taking into account that the new article may be a redirect - article that was a redirect is now a real article - article that was a real article is now a redirect
This would affect a very small number of article saves, maybe like 1 in 100 or less? Obviously this level of integration is going to require serious support from the developers.
would not be just the search, but also the edit box. Little bit of js that checks for the user inputting [[, and then the suggest code kicks in with possible article matches. It would make article linking a lot easier.
This sounds like something better left for the likes of Wikiwyg?
Steve
Hi,
You're right, there are quite a few optimizations that can be taken into account on article save. It is possible that on the full wikipedia the code would need to be extremely complex to manage the partial updates, but I haven't looked thru the code, just thought of it as a suggestion. It would be handy on any mediawiki db - and superb on wikipedia itself.
I've read a few of the Wikiwyg emails (there have been rather a lot, and rather a lot of supurious debate!). My 2c on Wikiwyg is that most editing tasks would be best achieved not from a Fckedit style editor, but from an augemented text editor. The first steps are already there with the (mostly useless) toolbar, but it could do a lot more before complex wysiwyg would be needed.
Ideas:
* A quick button to create a correctly formatted table. Asking for number of cols and rows. I find this a total pain, and it has rendered tables worthless to me, as a small change breaks the table. It could work as a DIV dialog or popup like fckedit. * Namespace/interwiki suggestion when entering names (though obviously what we have just been talking about is a step beyond!) * Should also work with template insertion. Ultimately should scan the template via XMLHttpRequest and give a list of template parameters back to the user. * Hot keys to make bold, italic, links etc (these may be there, but I haven't noticed them).
I notice that the wikiwyg stuff doesn't actually work that well - the indent in particular is lost, and if it is reduced to bold, italics and bullets it seems a little pointless - certainly a lot of effort. Most of the articles people complain about editing in Wikitext are filled to the gunnels with templates and extensions (like ref). An editor that handles those well is a tricky beast to design, let alone write. HTML editors - which are much more mature - tend to quickly revert to the source, since it often is the best way to handle such things.
However HTML editors work best when they suggest, and prod you along. I'd say any Wiki editor needs to look more down those lines - things like an auto preview of a template in a popup div would be far more useful that the traditional word ribbon.
One issue I have with developing all of these is the speed of MW startup. It cannot really be used to make AJAX type requests in its current form, as pages often take seconds to render. Not sure the best way to speed this up. The parser seems to be the slowest point in most of my profiles, but simple too much code to run thru for AJAX requests at the moment.
Best regards,
Alex Powell
On 8/16/06, Steve Bennett stevage@gmail.com wrote:
On 8/16/06, Alex Powell alexp700@gmail.com wrote:
I was wondering if it had an extension to keep its index up to date - an "OnArticleSave" hook. With that a really useful place to hook it in
That would be cool. Only three cases would need to be treated:
- brand new article: update the index taking into account that the new
article may be a redirect
- article that was a redirect is now a real article
- article that was a real article is now a redirect
This would affect a very small number of article saves, maybe like 1 in 100 or less? Obviously this level of integration is going to require serious support from the developers.
would not be just the search, but also the edit box. Little bit of js that checks for the user inputting [[, and then the suggest code kicks in with possible article matches. It would make article linking a lot easier.
This sounds like something better left for the likes of Wikiwyg?
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 8/16/06, Alex Powell alexp700@gmail.com wrote:
Ideas:
- A quick button to create a correctly formatted table. Asking for
number of cols and rows. I find this a total pain, and it has rendered tables worthless to me, as a small change breaks the table. It could work as a DIV dialog or popup like fckedit.
Just FYI, there are Word macro scripts that successfully translate to/from Word/MediaWiki tables.
- Namespace/interwiki suggestion when entering names (though obviously
what we have just been talking about is a step beyond!)
- Should also work with template insertion. Ultimately should scan the
template via XMLHttpRequest and give a list of template parameters back to the user.
Nice, would be fantastic if we had a formalised way of documenting parameters. Then it could, for example, scan the <NOINCLUDE> part of the template page for some special code and provide that too. Otherwise, most templates could only tell you how many parameters to provide, by scanning for {{{1}}}, {{{2}}} etc.
- Hot keys to make bold, italic, links etc (these may be there, but I
haven't noticed them).
May be impossible in JavaScript? Not sure what hotkeys are permitted.
I notice that the wikiwyg stuff doesn't actually work that well - the indent in particular is lost, and if it is reduced to bold, italics and bullets it seems a little pointless - certainly a lot of effort.
It's a work in progress. I see that straight indents aren't currently implemented. Obvioussly they'll convert to stacked :::'s. Indent works on lists.
Most of the articles people complain about editing in Wikitext are filled to the gunnels with templates and extensions (like ref). An editor that handles those well is a tricky beast to design, let alone write. HTML editors - which are much more mature - tend to quickly revert to the source, since it often is the best way to handle such things.
That may continue to be the case. Tools for beginners and experts alike should be encouraged.
However HTML editors work best when they suggest, and prod you along.
Like good code IDE's. These days you can code in a language you don't know at all just by relying on the online "prodding".
I'd say any Wiki editor needs to look more down those lines - things like an auto preview of a template in a popup div would be far more useful that the traditional word ribbon.
Nice. Suggest it. :)
Steve
May be impossible in JavaScript? Not sure what hotkeys are permitted.
Pretty much everything can be picked up these days from IE - though Firefox may report less than IE (I've had a few issues with FF's event handling over the last few months).
Like good code IDE's. These days you can code in a language you don't know at all just by relying on the online "prodding".
My model for the "suggestion" paradyme. Its friendly for novices and experts - but keeps you close to what you are trying to achieve. More powerful Wysiwyg features can always be added, but I believe all my suggestions could be implemented as "quick wins", as they just involve a bit of additional clientside script, not rewriting parser.php in JS - a nightmare of maintenance waiting to happen.
With some careful planning extensions could even spec their own JS suggest features and supply clientside code - allowing both sides of the process to be merged into a framework.
I'd say any Wiki editor needs to look more down those lines - things like an auto preview of a template in a popup div would be far more useful that the traditional word ribbon.
Nice. Suggest it. :)
Where?
Best wishes,
Alex
On 8/16/06, Alex Powell alexp700@gmail.com wrote:
One issue I have with developing all of these is the speed of MW startup. It cannot really be used to make AJAX type requests in its current form, as pages often take seconds to render. Not sure the best way to speed this up. The parser seems to be the slowest point in most of my profiles, but simple too much code to run thru for AJAX requests at the moment.
Doesn't Julien's program run independently of MediaWiki itself? That is, I got the impression that the client wasn't sending a request to index.php, it was sending it to Julien's program, so startup time for MediaWiki doesn't matter.
I was talking more generally about doing previews in an editor, not the suggest - ideally you'd want to call the server to render the preview thru MW, then you'd be guarenteed you had interpretted it (and any extensions installed on the server) correctly.
Alex
On 8/16/06, Simetrical Simetrical+wikitech@gmail.com wrote:
On 8/16/06, Alex Powell alexp700@gmail.com wrote:
One issue I have with developing all of these is the speed of MW startup. It cannot really be used to make AJAX type requests in its current form, as pages often take seconds to render. Not sure the best way to speed this up. The parser seems to be the slowest point in most of my profiles, but simple too much code to run thru for AJAX requests at the moment.
Doesn't Julien's program run independently of MediaWiki itself? That is, I got the impression that the client wasn't sending a request to index.php, it was sending it to Julien's program, so startup time for MediaWiki doesn't matter. _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 8/16/06, Julien Lemoine speedblue@happycoders.org wrote:
Hi Steve, For me, you can integrate it now and update the index (trie) after each xml dumps. When the compiler using a sql database will be ready you will be able to replace the update process with the new analyzer.
If you need information about how to install it, don't hesitate to ask me.
Hi, I mean, integrated with *Wikipedia*, not integrated with MediaWiki - this needs cooperation with various people, not least of whom is Brion.
In the meantime, can anyone think of how to make this work as a javascript hack? Would it be possible to make the interface compatible (same size etc) with the monobook search box, so it could be included somehow in a custom monobook.js script?
Steve
Hi Steve,
Steve Bennett wrote:
On 8/16/06, Julien Lemoine speedblue@happycoders.org wrote:
Hi Steve, For me, you can integrate it now and update the index (trie) after each xml dumps. When the compiler using a sql database will be ready you will be able to replace the update process with the new analyzer.
If you need information about how to install it, don't hesitate to ask me.
Hi, I mean, integrated with *Wikipedia*, not integrated with MediaWiki
- this needs cooperation with various people, not least of whom is
Brion.
In the meantime, can anyone think of how to make this work as a javascript hack? Would it be possible to make the interface compatible (same size etc) with the monobook search box, so it could be included somehow in a custom monobook.js script?
I was talking about integration in *wikipedia*, but if I understand correctly, you are talking about doing query on my web site from wikipedia search box ? If yes, it will probably be a little bit difficult because of browser protection, you can not use http request in javascript to another host.
Best Regards. Julien
On 8/16/06, Julien Lemoine speedblue@happycoders.org wrote:
I was talking about integration in *wikipedia*, but if I understand correctly, you are talking about doing query on my web site from wikipedia search box ? If yes, it will probably be a little bit difficult because of browser protection, you can not use http request in javascript to another host.
Ok, so that rules out integrating via a user script until the search database is hosted at wikipedia?
In the meantime, I think you can achieve that with GreaseMonkey.
Steve
Steve Bennett wrote:
On 8/16/06, Julien Lemoine speedblue@happycoders.org wrote:
I was talking about integration in *wikipedia*, but if I understand correctly, you are talking about doing query on my web site from wikipedia search box ? If yes, it will probably be a little bit difficult because of browser protection, you can not use http request in javascript to another host.
Ok, so that rules out integrating via a user script until the search database is hosted at wikipedia?
Yes, until the XMLHttpRequest query is done on the same domain and if you want to be efficient, the Tcp/Ip server I wrote to handle queries needs to be on the same host to avoid redirections cost (You can wrote a php wrapper on wikipedia.org that redirect queries to suggest.speedblue.org but that would not be efficient).
In the meantime, I think you can achieve that with GreaseMonkey.
Maybe, I do not know very well this firefox extension.
Best Regards. Julien
On 8/16/06, Julien Lemoine speedblue@happycoders.org wrote:
Maybe, I do not know very well this firefox extension.
Looks like it: http://diveintogreasemonkey.org/api/gm_xmlhttprequest.html In particular: Unlike the XMLHttpRequest object, GM_xmlhttpRequest is not restricted to the current domain; it can GET or POST data from any URL.
It's very easy to convert any arbitrary javascript to be a greasemonkey script that can be run on any page.
You say it would be inefficient to run the script from a server other than the one where the compiled database is stored? Is that because the "TCP/IP server" has a lot of communication with the database? Does that mean that there is only a single, low volume request back and forth between the client and the "TCP/IP server" each time the user presses a key? If so, a grease monkey script sounds quite feasible...
Steve
Steve Bennett wrote:
On 8/16/06, Julien Lemoine speedblue@happycoders.org wrote:
Maybe, I do not know very well this firefox extension.
Looks like it: http://diveintogreasemonkey.org/api/gm_xmlhttprequest.html In particular: Unlike the XMLHttpRequest object, GM_xmlhttpRequest is not restricted to the current domain; it can GET or POST data from any URL.
Ok, it is perfect :)
It's very easy to convert any arbitrary javascript to be a greasemonkey script that can be run on any page.
You say it would be inefficient to run the script from a server other than the one where the compiled database is stored? Is that because the "TCP/IP server" has a lot of communication with the database? Does that mean that there is only a single, low volume request back and forth between the client and the "TCP/IP server" each time the user presses a key? If so, a grease monkey script sounds quite feasible...
Yes, there is a low volume query between the client and the server each time the user presses a key (you can have a look at http://suggest.speedblue.org/query.php?query=welcome to have an idea of the volume). Me previous idea was a php wrapper on wikipedia.org (opening a socket to speedblue.org, get result and send it), that why I used inefficient, but you found a better idea with a grease monkey script.
Best Regards. Julien Lemoine
In the meantime, can anyone think of how to make this work as a javascript hack?
Sure, and I can even show you screenshots of me using it. Like the saying goes, "the future is already here, it's just not widely distributed yet" :-)
You can check out the experimental screenshots, and an explanation of how it was done, here: http://nickj.org/Experiment_with_Suggestion_Searching_on_Wikipedia
As you can see, it definitely has rough edges, but the basic concept works well. The main obstacles to normal people using it are the JavaScript cross-domain restrictions (details below), and updating the current MediaWiki search area to have the option of using suggestion searching.
I was talking about integration in *wikipedia*, but if I understand correctly, you are talking about doing query on my web site from wikipedia search box ? If yes, it will probably be a little bit difficult because of browser protection, you can not use http request in javascript to another host.
Ok, so that rules out integrating via a user script until the search database is hosted at wikipedia?
Yes, that is a problem, and to get around it I configured my proxy to silently rewrite certain requests so that they look like they come from en.wikipedia.org, but in reality they come from Julien's site. That allows me to get around this restriction (and probably GreaseMonkey would let you do something similar), but it won't be a viable option for 99.9% of people. For them, the suggestions need to come from the wikipedia (even if it's just a temporary two line script which bounces the request straight off to Julien's site).
All the best, Nick.
On 8/17/06, Nick Jenkins nickpj@gmail.com wrote:
Yes, that is a problem, and to get around it I configured my proxy to silently rewrite certain requests so that they look like they come from en.wikipedia.org, but in reality they come from Julien's site. That allows me to get around this restriction (and probably GreaseMonkey would let you do something similar), but it won't be a viable option for 99.9% of people. For them, the suggestions need to come from the wikipedia (even if it's just a temporary two line script which bounces the request straight off to Julien's site).
Just to clarify, a greasemonkey script would allow anyone with: a) firefox b) greasemonkey installed (it's just a firefox extension) c) the appropriate greasemonkey script installed (*very* quick and painless) ...to do this without hacking their proxy :)
Steve
Hi Julien,
I totally rewrote the compiler to works with the XML format of download.wikipedia.org.
Sounds good, and I really like it having the top 10 languages.
I think _maybe_ though it could be useful to have two variations of the Analyzer (like you have two variations of TcpQuery - one that uses DiskQuery, and one that uses MemoryQuery). With Analyzer though, it could be good to have one that connects to MySQL and gets the data directly from the database, and one that uses the downloaded XML dumps. This way, people can use whichever one is most appropriate for them. For example, for someone running a big MediaWiki site who wanted to look at the possibility of using suggestion searching, they probably wouldn't want to create an XML dump, then run Analyzer on the XML dump (this would be too slow, and too many steps, and take a lot of disk space). Rather, if possible, in that situation it would be nice to create the compiled files directly from the database.
To try and help with this, I've modified a copy of Analyzer.cpp to add basic importing (but just of the article names, not redirects or article counts) from MySQL (i.e. does not use any downloaded files). The rough file (which still needs work for redirects + article counts) is here: http://files.nickj.org/MediaWiki/MysqlAnalyzerCmd.cpp Please note that I have not used C or C++ in a _very_ long time, so if looks like I have done something silly then that is almost certainly correct. :-)
To use compile and run this, on a Debian/Ubuntu system, I did this:
# Install required MySQL libraries apt-get install libmysqlclient15-dev cd cmd # Compile: g++ -DHAVE_CONFIG_H -I. -I. -I.. -I../expat/lib -g -O2 -O3 -MT MysqlAnalyzerCmd.o -MD -MP -MF ".deps/MysqlAnalyzerCmd.Tpo" -c -o MysqlAnalyzerCmd.o MysqlAnalyzerCmd.cpp # Link: (Note: needs " -lmysqlclient" parameter) g++ -g -O2 -O3 -o MysqlAnalyzer -L../tools -L../serialization -L../analyzer MysqlAnalyzerCmd.o -lanalyzer -lserialization -ltools -lexpat -lglib-2.0 -lmysqlclient # Run (change hostname / username / password / database-name params as required) : ./MysqlAnalyzer localhost wikiuser FakePasswd wikidb
If it is working, it should print out something like this: ----------------------------- Connection success Found 12345 articles ----------------------------- Then use the .bin files as per usual on TcpQuery.
Also there is a small diff for WSuggest.js to fix a small problem in my autocomplete stuff. For example, suppose the user typed "Aer", then moved the text cursor back to be between the 'A' and the 'e', typed 'm' (to make "Amer") then typed 'p' (to try and spell 'Amper'). However in-between typing the 'm' and the 'p', the cursor position will jump to the end of the text box to try and autocomplete "American", so the result of pressing 'p' will be 'Amerp', not 'Amper'. To prevent this, will now only try to autocomplete if the cursor position is at the end of the text field. Diff is here: http://files.nickj.org/MediaWiki/WSuggest.js-0.4-autocomplete-update.txt
All the best, Nick.
Hi Nick,
Nick Jenkins wrote:
Sounds good, and I really like it having the top 10 languages.
I think _maybe_ though it could be useful to have two variations of the Analyzer (like you have two variations of TcpQuery - one that uses DiskQuery, and one that uses MemoryQuery). With Analyzer though, it could be good to have one that connects to MySQL and gets the data directly from the database, and one that uses the downloaded XML dumps. This way, people can use whichever one is most appropriate for them. For example, for someone running a big MediaWiki site who wanted to look at the possibility of using suggestion searching, they probably wouldn't want to create an XML dump, then run Analyzer on the XML dump (this would be too slow, and too many steps, and take a lot of disk space). Rather, if possible, in that situation it would be nice to create the compiled files directly from the database.
To try and help with this, I've modified a copy of Analyzer.cpp to add basic importing (but just of the article names, not redirects or article counts) from MySQL (i.e. does not use any downloaded files). The rough file (which still needs work for redirects + article counts) is here: http://files.nickj.org/MediaWiki/MysqlAnalyzerCmd.cpp Please note that I have not used C or C++ in a _very_ long time, so if looks like I have done something silly then that is almost certainly correct. :-)
To use compile and run this, on a Debian/Ubuntu system, I did this:
# Install required MySQL libraries apt-get install libmysqlclient15-dev cd cmd # Compile: g++ -DHAVE_CONFIG_H -I. -I. -I.. -I../expat/lib -g -O2 -O3 -MT MysqlAnalyzerCmd.o -MD -MP -MF ".deps/MysqlAnalyzerCmd.Tpo" -c -o MysqlAnalyzerCmd.o MysqlAnalyzerCmd.cpp # Link: (Note: needs " -lmysqlclient" parameter) g++ -g -O2 -O3 -o MysqlAnalyzer -L../tools -L../serialization -L../analyzer MysqlAnalyzerCmd.o -lanalyzer -lserialization -ltools -lexpat -lglib-2.0 -lmysqlclient # Run (change hostname / username / password / database-name params as required) : ./MysqlAnalyzer localhost wikiuser FakePasswd wikidb
If it is working, it should print out something like this:
Connection success Found 12345 articles
Then use the .bin files as per usual on TcpQuery.
Thank you very much for your contribution Nick. It is better indeed to have the two versions, I preferred working on xml dumps at the beginning since it was faster and easier for me to update wikipedia-suggest and need less time/memory/cpu power (I have the analyzer and the sql server on the same computer). But the next step of wikipedia-suggest for me is to wrote a sql analyzer (probably using OTL : otl.sf.net and unixodbc), but unfortunately I will not be able to wrote it before september.
Also there is a small diff for WSuggest.js to fix a small problem in my autocomplete stuff. For example, suppose the user typed "Aer", then moved the text cursor back to be between the 'A' and the 'e', typed 'm' (to make "Amer") then typed 'p' (to try and spell 'Amper'). However in-between typing the 'm' and the 'p', the cursor position will jump to the end of the text box to try and autocomplete "American", so the result of pressing 'p' will be 'Amerp', not 'Amper'. To prevent this, will now only try to autocomplete if the cursor position is at the end of the text field. Diff is here: http://files.nickj.org/MediaWiki/WSuggest.js-0.4-autocomplete-update.txt
Thank you, I release the wikipedia-suggest 0.41 with your contribution : http://suggest.speedblue.org/tgz/wikipedia-suggest-0.41.tar.gz
Best Regards. Julien Lemoine
Nick Jenkins wrote:
Hi Julien,
The latest version of the sources with all these modifications is available at : http://suggest.speedblue.org/tgz/wikipedia-suggest-0.31.tar.gz
Some search suggestion updates:
- Got rid of the race condition in autocomplete when user types fast (e.g. can type "new" fast without having it lose the
highlighting on " York").
- Removed the first field from the JSON response from TcpQuery (basically client was asking a question, and getting a response from
server that said "the question you asked is this, and the answer is this" - however we only need the answer back, not the question). Also removed a handful of spaces that aren't needed in the JSON data.
- Pressing "Enter" on a response which has no matches will now invoke MediaWiki's "Special:Search" on that query.
- Moved the URL to the wiki to index.php (as "var baseUrl"), which hopefully means that a single WSuggest.js file can be used for
both English and French.
- Added a CHANGELOG file to keep track of what's changed + release dates.
- With the "var autocomplete" + delete key stuff (intKey == 8), I think the intent was to update the suggestion list (but not the
autocomplete) if the user presses the backspace/delete keys. Added a 'showAutocomplete' boolean parameter for this.
- If the user searched for "XYZ Affair", then deleted the "Y" to give "XZ Affair", it would still show "XYZ Affair" as the only
matching result (when really there are no matching results). To prevent this removed the 'cache' boolean var.
Patch against 0.31 is at http://files.nickj.org/MediaWiki/wikipedia-suggest-0.31-diff.txt
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Dear Nick! Thank you for the Web-Adress, very useful. But how can we install the Suggest to the Wikipedia, and if we will, why not also install it to other sister Projects.
Greetings, de.wikipedia.org/wiki/Benutzer:Abzt
Frederic Bayer wrote:
Dear Nick! Thank you for the Web-Adress, very useful. But how can we install the Suggest to the Wikipedia, and if we will, why not also install it to other sister Projects.
Since the list threads are rather long, please summarize and put details on a request at bugzilla.wikimedia.org. :)
I'm taking a few days off, but will be back to set things like this up next week (unless you can get Tim et al to do it faster).
-- brion vibber (brion @ pobox.com)
Frederic Bayer wrote:
Dear Nick! Thank you for the Web-Adress, very useful. But how can we install the Suggest to the Wikipedia, and if we will, why not also install it to other sister Projects.
Since the list threads are rather long, please summarize and put details on a request at bugzilla.wikimedia.org. :)
I'm taking a few days off, but will be back to set things like this up next week (unless you can get Tim et al to do it faster).
-- brion vibber (brion @ pobox.com)
I finally got around to logging a request for this - it's at: http://bugzilla.wikimedia.org/show_bug.cgi?id=7288
Includes a summary description of what's being requested, some potential screenshots to illustrate, info about and link to Julien Lemoine's site and source code, and my "devil's advocate" list of things that ideally would behave differently so as to help integration with MediaWiki.
Comments and technical feedback welcome.
All the best, Nick.
On 8/11/06, Nick Jenkins nickpj@gmail.com wrote:
It's possible to have the JavaScript search for " → " and stop any string at that point, but it feels a bit ugly, so I have omitted that.
What would happen if someone was genuinely looking for http://en.wikipedia.org/w/index.php?title=%E2%86%90%E2%86%93%E2%86%91%E2%86%...
:P
It's possible to have the JavaScript search for " → " and stop any string at that point, but it feels a bit ugly, so I have omitted that.
What would happen if someone was genuinely looking for http://en.wikipedia.org/w/index.php?title=%E2%86%90%E2%86%93%E2%86%91%E2%86%...
:P
I knew there was a reason it felt a bit wrong, and that example is probably it! ;-)
And by the way, if I copy and paste in the "←↓↑→" (i.e. the 4 arrow symbols in case an email system munges it), then the search suggest does give the "Dance Dance Revolution" redirect as it's suggestion ;-)
All the best, Nick.
wikitech-l@lists.wikimedia.org