[Wikipedia-l] Fwd: Re: Wikipedia/OneLook

Brion VIBBER brion at pobox.com
Thu Sep 26 04:09:26 UTC 2002


(re providing an RDF feed)
Stephen Gilbert wrote:
> We'd probably want to do a little filtering. Right
> now, OneLook is indexing over 60,000 "words" (meaning
> article titles) from Wikipedia. Maybe there could be a
> raw feed and a filtered feed?

What kind of filtering would you suggest?

Assuming our intention is to provide an index of encyclopedia articles, 
it might be reasonable to, for instance, filter out certain kinds of 
redirects:
* to an article that's identical except for case; since searches are 
usually case-insensitive they're redundant (though harmless)
* to a page in a non-article namespace (talk, user, wikipedia, image)
* to a non-existent page

-- brion vibber (brion @ pobox.com)




More information about the Wikipedia-l mailing list