Re: [Wikitech-l] Five points we should be discussion about the mediawiki projects

18 Mar 2005

      On Fri, 18 Mar 2005 16:12:32 +0000, David Gerard dgerard@gmail.com wrote:
...
It was that you spent five and a half paragraphs (618 words) decrying
fancruft and positing that the problem was too many articles, presumably
caused primarily by fancruft (I'm presuming from the 618 words leading
up to saying it was overloading the system). This seemed a somewhat novel
argument against fancruft, and I recall asking a dev on IRC (think it was
JamesDay, not sure) if too many articles was in fact a source of our
problems, and being told no.
I've been accused many times of being inconcise. This is perhaps one
of those times. The fact, however, that you and others are measuring
the responses I am making to these posts by words and kilobytes is
shocking to me. What a complete and utter waste of time.
...
So yes: since it's an anti-fancruft diatribe with blame on the end, I'm
Here's some vituperation for you. Kindly fuck off. This is not an
anti-fancrut diatribe.
...

Alex, are you *seriously* claiming excess small articles will hasten the

downfall of Wikipedia?
[ ] Yes
[X] No

If "yes", do you have any numbers?

[ ] Yes, here they are
[ ] No
[X] N/A
First, I am asking questions and making suggestions. If I made any
statements that seemed to indicate that I knew that X thing was
exploding the wikipedia, either I overstated my point, or somebody
misunderstood what I was saying. I think you'll understand if you read
on.
Let me explain the reason I bring fancruft up. In fact, since fancruft
is obviously such a loaded term, let's just use the word "cruft" to
refer to large groups of small related articles.
As a database goes searching through its indices and tables looking
for a tuple, it must iteratively go over other tuples before it finds
the one it wants. It doesn't just have some magic pointer that knows
where [[designated marksman]] is. It has to figure it out. When the
number of articles exceeds several hundred thousand, you really need
to "give it hints" about where to find that article. I don't know if
this is already being done, but it might be possible to use things
like categories (or some form of tagging) to "associate" groups of
articles so that, while they might not have to live in their own
separate table, that they were more easily "findable" by the database.
Do you understand the difference between a sequential scan and an
index scan? I mean, I don't know what technical level to start at.
Postgres has this feature I like a lot called "explain", so I can say:
wikipedia=# explain select colname from tablename where length(colname) > 5;
and it will explain how the query planner intends to execute that
query. This is how one obtains numbers and increases the performance
of ones' SQL or ones' schema (although there's hardly any difference,
now, is there?).
Right now, I can tell you that "having more articles" means "the
wikipedia will get slower" because there is no way to avoid the fact
that the database has to go through the articles its got to find the
one you just asked it for.
Now might be a good time for somebody familiar with the schema to step
in and either tell me that I'm FOS or whether MySQL has some
devil-instilled magic that allows it to defy the laws of databases, or
something.
Since there seems to be so much hostility coming from you, let me
point you in the direction of two small pages which will help:
http://www.petdance.com/perl/geek-culture/
and
http://c2.com/cgi/wiki?SetTheBozoBit
Cheers,
aa
-- 
Alex Avriette
avriette@gmail.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Five points we should be discussion about the mediawiki projects