On Tue, 2002-12-10 at 14:51, Jonathan Walther wrote:
On Tue, Dec 10, 2002 at 02:54:49PM -0800, Brion Vibber wrote:
On Tue, 2002-12-10 at 05:43, Jonathan Walther wrote:
For the second one, don't we just need to check the existance and type of the linked pages? Why bother about the size?
For the stub detector. (Optional.)
Ah. I was assuming that empty articles wouldn't exist unless they had been purposely cleared in an edit war. *cough*RK*cough*
The stub detector marks links to articles below an (arbitrary, user-selectable) size as with a distinct link class.
We do also need to get the size of the primary page.
What for? We parse it until we hit the null byte at the end.
Even though we are storing the text in UTF-8? Is that safe?
The only time a null byte ever appears in UTF-8 is to represent the null character. Since the null is generally considered déclassé in human-readable text, I think we're safe. :)
A note about searching: with Postgres we can just set up the UTF-8 collation on the article text fields, and then we can use LIKE clauses to search for text in the articles and titles, and the characters will be collated correctly, without any extra coding on our part. Another reason I am eager to use Postgres. The search engine will be trivial.
Will that be reasonably fast? How would we rank pages in the search?
-- brion vibber (brion @ pobox.com)