Hello, everyone
Rendering a Wikipedia article requires, for example, to
look up all the
links contained in it and determine if the pages exist or
not.
What else do we need to render wiki-marked up text? Aside from images, I don't see any other parts that require access to the main index.
Actually, we had a *decrease* in traffic in the last month
due to the
Google hiccups.
I know this.
We should be able to cope with much higher traffic if we
optimize our queries. Note that pure bandwidth is not a
problem; the
database tarball downloads are very fast. Ask Brion for the
server specs
and be impressed.
Ok, so the problem is cpu usage or database optimization stuff. I think you are right. If the database and queries to it is too complex, decentralizing hardly improve the performance.
By the way, notice I am not talking about only the performance but also scalability and extenability. What about such a case?
- You cite Google as an example of a huge centralized
database
No, I cited it as an of 'decentralized' database.
Trust me, wiki is *very* hard to decentralize. It's a nice
idea, but it
will take years until it happens. You need an architecture
like Freenet
( http://freenetproject.org ), only scalable (which Freenet
is not),
plus SQL-like query support.
While it is nice, it seemed for me only solution eventually. I think finally see the gap of understanding. I was talking about years-long project. I was talking about the time wikipedia reaches the next milestone 1,000,000 (million- pedia? haha). It seemed that eventually we have to go to the same path that big sites like google or amazon went. As you and I know, Google is heavily decentralized and it is one of its strength. I bet you know about load balancers (I know almost nothing about it thought). In my knowledge, most of huge sites decentraize their site to mimi servers like Google. We, of course, don't have finance to sustain such a huge decentralized data-center. But we have decent democratic community. The strehgth of us is that community.
As I wikipediaholic (ach!), I am so worried about the future of wikipedia in terms of the server. We need definitely better solution (if not necessary my decentralized server idea). Possible solution includes the proposal Pieter's published scripts or better web-site for wikipedia developers. (It seemed quite a few people who are actually coding for wikipedia, compared with the scale of wikipedians writing articles)
wiki is *very* hard to decentralize
I knew this. But can we figure out about how in here? I am not saying you do what I told. I can cooporate of course, tons of skilled programmers can do too I guess.
If you think this debate is totally wasting time (I mean if I, who knows little, am annoying this mailing list), let me know then I quit but if not, please give me a comment.
Okay, how about this? It seems to me that one of core problems is rendering requires queries about whether the page exists or not. I remember the post saying the rendering of a page containg many internal links is one of bottleneck (I suppose it is still true).
First, each mimi-wikipedia has a database about if the page exists or not and subscribes the list of newly-made pages. Also, when it is launched, it downloads the complete list of new pages. Because now that each mimi knows which page exists or not regardless its mimi-database, it is possible to render the page without querying the main database.
For disclamiler, probably I am wrong. But if you can, can you tell me why and how so?
Anyway, I understand wikipedia still can be optimized much (that I didn't know in the first time I posted my proposal) Sure, if we can optimize the database and it gains the huge increase in performance hopefully, we should head for there of course. The priority should be optimization of the database. I agree.
Anyhow, I approciate your detailed explanation about the problem we face now to me who knows really little.
wikitech-l@lists.wikimedia.org