I hardly write e-mails and even post to the mailing-list, so if I did something wrong, just let me know. Also, I am well known for terrible speeling and non-English grammer, please excuse them too.
I have some proposal to the wikipedia system.
I don't have any worry about the encyclopedical side of wikipedia. Both quality and coverage have been improving constantly. It is just fine. But recently I started to worry about technological side or hosting stuff.
While contents of wikipedia are open-conent and it is nice, hosting server and database are not. As we all know the current server is inadequate to support the huge traffic and it seems going to be worse and worse.
Now then, here is my proposal. Can we distribute database and related stuff? Each mimi-wikipedia contains certain articles and provide both displaying and editing service. Search querys are sent to all of mimi-wikipedian then put the results together in the main server and return it.
The scale of mimi-wikipedia may vary, say 100 artcles to possibly 10,000. I think there are many people who are willing to provide diskspaces, including me.
The problem is set up. Surely the current dependency hell makes hard to host mimi-wikipedia, particularly on windows- based servers.
Another problem is what if some mimi-wikipedian is down? For financially, lost of personal interest, whatever reason. My solution is duplication.
Also this duplication scheme makes wikipedia more reliable and easier to extend the capacity in terms of hosting.
I don't know if this kind of system is fesible. I don't see such a site. But if we agree the idea is good, why don't we try it?
I agree with the guy who suggested pure wikipedia software written in C (I am sorry I can't remember his name).
Also, if there was some similar proposal before, excuse me.
Hi Taku,
we *know* that the site is slow. And we *know* that it's not because our server is too small. Our problems are database/lock related, and putting stuff on an even bigger server will not help much when dealing with O(n^x) problems. What we need to figure out is:
- When are our tables/rows locked and why (this behavior has changed drastically with the recent update to InnoDB). - When are our queries using indices and when are they not, and why (MySQL index behavior can be very hard to predict).
Solving these two problems should make Wikipedia very fast. If we cannot optimize some queries, we need to think about making them simpler, or caching them. (Also note that MySQL now supports subqueries, which we don't use yet.) We are dealing with *particular* queries in *particular* situations that make Wikipedia slow.
In other words, if I had the option to invest $10k in a new server or in a experienced database engineer, I'd pick the latter.
Now then, here is my proposal. Can we distribute database and related stuff?
Not practical. Too many queries require access to a single centralized article database, even an index alone won't suffice. Think of stuff like Most wanted, Orphaned pages etc. Besides, it won't make things any faster because our problem is not a too small server.
Regards,
Erik
Takuya Murata wrote:
I hardly write e-mails and even post to the mailing-list, so if I did something wrong, just let me know. Also, I am well known for terrible speeling and non-English grammer, please excuse them too.
I have some proposal to the wikipedia system.
I don't have any worry about the encyclopedical side of wikipedia. Both quality and coverage have been improving constantly. It is just fine. But recently I started to worry about technological side or hosting stuff.
While contents of wikipedia are open-conent and it is nice, hosting server and database are not. As we all know the current server is inadequate to support the huge traffic and it seems going to be worse and worse.
Now then, here is my proposal. Can we distribute database and related stuff? Each mimi-wikipedia contains certain articles and provide both displaying and editing service. Search querys are sent to all of mimi-wikipedian then put the results together in the main server and return it.
The scale of mimi-wikipedia may vary, say 100 artcles to possibly 10,000. I think there are many people who are willing to provide diskspaces, including me.
The problem is set up. Surely the current dependency hell makes hard to host mimi-wikipedia, particularly on windows- based servers.
Another problem is what if some mimi-wikipedian is down? For financially, lost of personal interest, whatever reason. My solution is duplication.
Also this duplication scheme makes wikipedia more reliable and easier to extend the capacity in terms of hosting.
I don't know if this kind of system is fesible. I don't see such a site. But if we agree the idea is good, why don't we try it?
I agree with the guy who suggested pure wikipedia software written in C (I am sorry I can't remember his name).
That's me!!! :-) I have the same worries. It is a shame that "WikicWeb" lost its' C sourcecode, read: http://www.c2.com/cgi/wiki?WikicWeb
Possibility to download a (weekly or so) snapshot, or a tarball containing ALL WikipediaNL content (-Idols) would also be nice. :-)
Agree, we should think (brainstorm) about distributed servers, but this is a very difficult topic (for me at least).
I think I _can_ write a proper and reliable WikiServer in pure C (only needing a proper POSIX OS). I think it can only be done if Wiki-syntax is defined very strictly, expanding the software will be difficult / dangerous in C.
I agree C sure has some major drawbacks. (As wiki's evolve, software might need to adapt with it: On the other hand, people know the evolution of Wiki now: we should come to an Open Standard for Wikipedia-syntax (ban html and so, or choose for full HTML4.01 and ban Wiki-syntax). If all these problems can be solved (needs consensus) a very FAST server in pure C can be set up: only dependancies will be: 1: Operating system (Preferably some sort of Linux or other Unix). 2: C Compiler (Preferably GCC). 3: A physical machine attached to the internet.
4?: Maybe some sort of progressive compression algorithm (as an alternative for diff maybe?)
I've got more weird ideas: I'll make the source-code of my pure-C- wikisoftware available via its' WikiWebInterface ITSELF! This way, I'll never loose my sources! (the recursive trick). Now Wikipedia depends on Sourceforge. This is not necessary: Wikipedia could own their own sources on Wikipedia.org itself.
I am searching for proper C-alternatives for Wikipedia. Preferably with with Wikipedia-functionality. If I cannot find it, I will develop my own Wiki in pure C. I only need a very strict definition of Wikisyntax (and semantics).
One can do without mySQL and Apache and do it ALL in ONE programming language. If we do it all in C, we only need a proper compiler: GNU's GCC and an OS nothing else!
Thanks for reading :-) (C-madman) Pieter Suurmond
Also, if there was some similar proposal before, excuse me. _______________________________________________ Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l
On Wed, Jan 29, 2003 at 01:51:48AM +0100, Pieter Suurmond wrote:
Takuya Murata wrote:
The problem is set up. Surely the current dependency hell makes hard to host mimi-wikipedia, particularly on windows- based servers.
No, you got it all wrong. It's not "dependency hell", it's "server-side programming". We're supposed to use whatever we find convenient on server-side. That's the paradigm on this side of network connection.
These dependencies aren't really that weird - Apache, PHP and MySQL are available on all Unices (and Windows), and offered by most web hosting providers.
If you want math, maybe LaTeX, GhostScript and OCaml are not that popular, but you'll need first two in any case if you want to render math, and OCaml files may be compiled to native codes, so you just have to put binaries on server.
As for windows servers, it doesn't really matter. They are rather small minority, and have so many problems that supporting them is not really worth effort. Just get some Unix.
wikitech-l@lists.wikimedia.org