On lun, 2003-02-17 at 01:43, Ray Saintonge wrote:
Brion Vibber wrote:
Okay, very preliminary version (code is in CVS):
http://test.wikipedia.org/wiki/Special:Allpages
It looks beautiful!
Woo-hoo!
There's
probably some wiggle room in the ideal number of links per page
and whatnot. It needs to be made prettier, with backlinks to the top
level index and forward/back browsing, but the basic functionality is
there.
And I thought I could expect perfection on the first draft. :-)
What, and leave you nothing to look forward to? ;)
Would a table of equivalent values be workable so that
for sorting and
searching purposes "à" and "ã" would be considered equivalent to
"a",
etc. A second level of mini-sort would only be required when accents are
the only thing distinguishing two entries.
In order to be at all non-molasses-like, the sorting has to be ingrained
into the indexes in the database.
To summarize my proposal on wikitech-l: this means either creating a
suitable charset plugin for MySQL to work with when building indexes
(can only be set on a server-wide basis -- not good enough for us, as
each language must be treated distinctly) or adding a special sort-key
field for each article. With a separate sort key, we can munge the
titles on a per-language basis so that characters are equivalized or
separated and rearranged such that a simple ASCII-style sort on the
hidden field will turn up the right order.
For English and French, this means simply replacing accented characters
with their base letters. For other languages this may involve adding
dummy high or low ascii chars to force a letter to sort above or below
an equivalent. But in all cases, it's the same basic mechanism -- make
some replacements on a string, then store the result and let the
database do a dumb sort with it.
Not really related, but what might be useful too is a (per-language)
list of index points; we might want the top level index to force
distinct sections for each letter.
ie instead of:
Aardvark-Audio
Aural-Catapult
...
we force index breaks at the start of each letter:
Aardvark-Audio
Aural-Azimuth
Baal-Buzz
Cab-Cop
....
Another
interesting challenge will come from how we deal and reconcile with the
established policy of putting names as [[John Smith]] instead of
[[Smith, John]]
The simplest way is to abolish all alphabetical listings. ;)
As much as I find beauty in the proposal, there may be
other ways to do
this that are less demanding on the system, but just as easy for the
user.. A tree that requires succesively choosing the first, second and
third letters of the first word might do this. So would a simple browse
function that asks the user to supply the first few letters to begin his
browse. This would still contain the opportunity to step back and forth
to adjacent blocks.
Well, if you consider this to be a browse function:
CHDIR C:\WIKI\ESP
DIR ESP*.*
;)
Part of the usefulness of the all pages index is, at least in theory,
providing a fairly direct path to all pages for search engine spiders;
orphans and islands would still get linked to and indexed. So it's got
to be linkable. That, and the funnest part of browsing is coming across
things you wouldn't have thought of ahead of time -- this is helped by
making some real words visible. Plus, it just establishes context more
clearly to see whole words than a couple of letters jammed together.
-- brion vibber (brion @
pobox.com)