On lun, 2003-02-17 at 01:43, Ray Saintonge wrote:
Brion Vibber wrote:
Okay, very preliminary version (code is in CVS): http://test.wikipedia.org/wiki/Special:Allpages
It looks beautiful!
Woo-hoo!
There's probably some wiggle room in the ideal number of links per page and whatnot. It needs to be made prettier, with backlinks to the top level index and forward/back browsing, but the basic functionality is there.
And I thought I could expect perfection on the first draft. :-)
What, and leave you nothing to look forward to? ;)
Would a table of equivalent values be workable so that for sorting and searching purposes "à" and "ã" would be considered equivalent to "a", etc. A second level of mini-sort would only be required when accents are the only thing distinguishing two entries.
In order to be at all non-molasses-like, the sorting has to be ingrained into the indexes in the database.
To summarize my proposal on wikitech-l: this means either creating a suitable charset plugin for MySQL to work with when building indexes (can only be set on a server-wide basis -- not good enough for us, as each language must be treated distinctly) or adding a special sort-key field for each article. With a separate sort key, we can munge the titles on a per-language basis so that characters are equivalized or separated and rearranged such that a simple ASCII-style sort on the hidden field will turn up the right order.
For English and French, this means simply replacing accented characters with their base letters. For other languages this may involve adding dummy high or low ascii chars to force a letter to sort above or below an equivalent. But in all cases, it's the same basic mechanism -- make some replacements on a string, then store the result and let the database do a dumb sort with it.
Not really related, but what might be useful too is a (per-language) list of index points; we might want the top level index to force distinct sections for each letter.
ie instead of: Aardvark-Audio Aural-Catapult ...
we force index breaks at the start of each letter: Aardvark-Audio Aural-Azimuth Baal-Buzz Cab-Cop ....
Another interesting challenge will come from how we deal and reconcile with the established policy of putting names as [[John Smith]] instead of [[Smith, John]]
The simplest way is to abolish all alphabetical listings. ;)
As much as I find beauty in the proposal, there may be other ways to do this that are less demanding on the system, but just as easy for the user.. A tree that requires succesively choosing the first, second and third letters of the first word might do this. So would a simple browse function that asks the user to supply the first few letters to begin his browse. This would still contain the opportunity to step back and forth to adjacent blocks.
Well, if you consider this to be a browse function:
CHDIR C:\WIKI\ESP DIR ESP*.*
;)
Part of the usefulness of the all pages index is, at least in theory, providing a fairly direct path to all pages for search engine spiders; orphans and islands would still get linked to and indexed. So it's got to be linkable. That, and the funnest part of browsing is coming across things you wouldn't have thought of ahead of time -- this is helped by making some real words visible. Plus, it just establishes context more clearly to see whole words than a couple of letters jammed together.
-- brion vibber (brion @ pobox.com)