On Fri, 18 Mar 2005 16:12:32 +0000, David Gerard dgerard@gmail.com wrote:
It was that you spent five and a half paragraphs (618 words) decrying fancruft and positing that the problem was too many articles, presumably caused primarily by fancruft (I'm presuming from the 618 words leading up to saying it was overloading the system). This seemed a somewhat novel argument against fancruft, and I recall asking a dev on IRC (think it was JamesDay, not sure) if too many articles was in fact a source of our problems, and being told no.
I've been accused many times of being inconcise. This is perhaps one of those times. The fact, however, that you and others are measuring the responses I am making to these posts by words and kilobytes is shocking to me. What a complete and utter waste of time.
So yes: since it's an anti-fancruft diatribe with blame on the end, I'm
Here's some vituperation for you. Kindly fuck off. This is not an anti-fancrut diatribe.
- Alex, are you *seriously* claiming excess small articles will hasten the
downfall of Wikipedia?
[ ] Yes [X] No
- If "yes", do you have any numbers?
[ ] Yes, here they are [ ] No [X] N/A
First, I am asking questions and making suggestions. If I made any statements that seemed to indicate that I knew that X thing was exploding the wikipedia, either I overstated my point, or somebody misunderstood what I was saying. I think you'll understand if you read on.
Let me explain the reason I bring fancruft up. In fact, since fancruft is obviously such a loaded term, let's just use the word "cruft" to refer to large groups of small related articles.
As a database goes searching through its indices and tables looking for a tuple, it must iteratively go over other tuples before it finds the one it wants. It doesn't just have some magic pointer that knows where [[designated marksman]] is. It has to figure it out. When the number of articles exceeds several hundred thousand, you really need to "give it hints" about where to find that article. I don't know if this is already being done, but it might be possible to use things like categories (or some form of tagging) to "associate" groups of articles so that, while they might not have to live in their own separate table, that they were more easily "findable" by the database.
Do you understand the difference between a sequential scan and an index scan? I mean, I don't know what technical level to start at. Postgres has this feature I like a lot called "explain", so I can say:
wikipedia=# explain select colname from tablename where length(colname) > 5;
and it will explain how the query planner intends to execute that query. This is how one obtains numbers and increases the performance of ones' SQL or ones' schema (although there's hardly any difference, now, is there?).
Right now, I can tell you that "having more articles" means "the wikipedia will get slower" because there is no way to avoid the fact that the database has to go through the articles its got to find the one you just asked it for.
Now might be a good time for somebody familiar with the schema to step in and either tell me that I'm FOS or whether MySQL has some devil-instilled magic that allows it to defy the laws of databases, or something.
Since there seems to be so much hostility coming from you, let me point you in the direction of two small pages which will help:
http://www.petdance.com/perl/geek-culture/
and
http://c2.com/cgi/wiki?SetTheBozoBit
Cheers, aa