Hello, everyone
Rendering a Wikipedia article requires, for example, to
look up all the
links contained in it and determine if the pages exist
or
not.
What else do we need to render wiki-marked up text? Aside
from images, I don't see any other parts that require access
to the main index.
Actually, we had a *decrease* in traffic in the last
month
due to the
Google hiccups.
I know this.
We should be able to cope with much higher traffic if we
optimize our queries. Note that pure bandwidth is not a
problem; the
database tarball downloads are very fast. Ask Brion for
the
server specs
and be impressed.
Ok, so the problem is cpu usage or database optimization
stuff. I think you are right. If the database and queries to
it is too complex, decentralizing hardly improve the
performance.
By the way, notice I am not talking about only the
performance but also scalability and extenability. What
about such a case?
2) You cite Google as an example of a huge centralized
database
No, I cited it as an of 'decentralized' database.
Trust me, wiki is *very* hard to decentralize. It's
a nice
idea, but it
will take years until it happens. You need an
architecture
like Freenet
(
http://freenetproject.org ), only scalable (which
Freenet
is not),
plus SQL-like query support.
While it is nice, it seemed for me only solution eventually.
I think finally see the gap of understanding. I was talking
about years-long project. I was talking about the time
wikipedia reaches the next milestone 1,000,000 (million-
pedia? haha). It seemed that eventually we have to go to the
same path that big sites like google or amazon went. As you
and I know, Google is heavily decentralized and it is one of
its strength. I bet you know about load balancers (I know
almost nothing about it thought). In my knowledge, most of
huge sites decentraize their site to mimi servers like
Google. We, of course, don't have finance to sustain such a
huge decentralized data-center. But we have decent
democratic community. The strehgth of us is that community.
As I wikipediaholic (ach!), I am so worried about the future
of wikipedia in terms of the server. We need definitely
better solution (if not necessary my decentralized server
idea). Possible solution includes the proposal Pieter's
published scripts or better web-site for wikipedia
developers. (It seemed quite a few people who are actually
coding for wikipedia, compared with the scale of wikipedians
writing articles)
wiki is *very* hard to decentralize
I knew this. But can we figure out about how in here? I am
not saying you do what I told. I can cooporate of course,
tons of skilled programmers can do too I guess.
If you think this debate is totally wasting time (I mean if
I, who knows little, am annoying this mailing list), let me
know then I quit but if not, please give me a comment.
Okay, how about this? It seems to me that one of core
problems is rendering requires queries about whether the
page exists or not. I remember the post saying the rendering
of a page containg many internal links is one of bottleneck
(I suppose it is still true).
First, each mimi-wikipedia has a database about if the page
exists or not and subscribes the list of newly-made pages.
Also, when it is launched, it downloads the complete list of
new pages. Because now that each mimi knows which page
exists or not regardless its mimi-database, it is possible
to render the page without querying the main database.
For disclamiler, probably I am wrong. But if you can, can
you tell me why and how so?
Anyway, I understand wikipedia still can be optimized much
(that I didn't know in the first time I posted my proposal)
Sure, if we can optimize the database and it gains the huge
increase in performance hopefully, we should head for there
of course. The priority should be optimization of the
database. I agree.
Anyhow, I approciate your detailed explanation about the
problem we face now to me who knows really little.