Andre Engels wrote:
What I would in particular like, is to have the ability to check at once new messages on my User talk: in all languages, and having a merged watchlist.
I think that is beyond what is discussed now but also a feature I would like - it would propably also increase the number of people contributing to multiple languages. But before you think about adding more and more tables to a system getting more and more complex:
*Please* implement a *completely independent watchlist-server*. Since Wikipedia is growing exponential modularizing it is the only way to survive. It could also help attracting programmers that do not like to get into a big monolithic program like MediaWiki that you can almost only use for Wikipedia.
A watchlist-server would only need to collect RSS-feeds of recent changes and get watch/unwatch-this-page-messages of users - no work on the live database. Maybe it could also collect "you-have-a-message"-notices, posts to the mailinglists and even other weblogs and stuff. Each user gets his aggregated feed of changes, new articles and what he wants - you can also deliver messages about changes per mail and of course RSS. Not to speek about features like * list of most-watched-articles * diagram of current active in edits/minute at each wikipedia * subscribing public pre-defined-watchlists and even notices about articles that are watched by people that watch similar articles like I do!
Well, we do not need all of this features right now but its much more likely to get them implemented and running independent from the MediaWiki-codebase and the Wikipedia-server that may switch them on and off.
By the way the search functionality can also be put on an independent server with an independent database like every other read-only-tasks.
just my 2 cents
Jakob
At 10:51 AM 1/16/2004, you wrote:
*Please* implement a *completely independent watchlist-server*. Since Wikipedia is growing exponential modularizing it is the only way to survive. It could also help attracting programmers that do not like to get into a big monolithic program like MediaWiki that you can almost only use for Wikipedia.
This may be an old idea, or not. But the thought of modularizing did bring a thought to mind that is related to how Ebay does things...
What if the articles in a certain alphabetic range were stored on one server, and those in another alphabetic range stored on another? A third server stands in front doing nothing more than redirecting traffic to the other servers. It might make statistics and such a bit more of a headache, but it would certainly assist in reducing the bandwidth requirements of the server.
Whether such an implementation would be simplistic or terribly difficult requires a greater understanding of the underlying architecture than I have.
Just a thought.
-Kelly
On Fri, 16 Jan 2004 17:50:33 -0700, Kelly Anderson wrote:
This may be an old idea, or not. But the thought of modularizing did bring a thought to mind that is related to how Ebay does things...
What if the articles in a certain alphabetic range were stored on one server, and those in another alphabetic range stored on another? A third server stands in front doing nothing more than redirecting traffic to the other servers. It might make statistics and such a bit more of a headache, but it would certainly assist in reducing the bandwidth requirements of the server.
Google does the same, i guess any really big DB application has to.
Whether such an implementation would be simplistic or terribly difficult requires a greater understanding of the underlying architecture than I have.
There would be a need to write logic to merge the separate result sets.
The temporary DB server is a very small machine with something like 1Gb ram. Geoffrey (with 4Gb) was fast before it went down, a second server is ordered that can do reads in the new setup. I'd also expect the DB to be the weakest spot in the new setup, but at a completely different level. The number of DB connections should drop by 50% or so (even more with memcached), the amount of ram will be sufficient to hold most of the DB in ram, with the option to top it up to 16Gb on each machine.
At 04:34 AM 1/17/2004, you wrote:
On Fri, 16 Jan 2004 17:50:33 -0700, Kelly Anderson wrote:
This may be an old idea, or not. But the thought of modularizing did bring a thought to mind that is related to how Ebay does things...
What if the articles in a certain alphabetic range were stored on one server, and those in another alphabetic range stored on another? A third server stands in front doing nothing more than redirecting traffic to the other servers. It might make statistics and such a bit more of a headache, but it would certainly assist in reducing the bandwidth requirements of the server.
Google does the same, i guess any really big DB application has to.
Well, someday, probably sooner than later, Wikipedia will be this type of application, I'm guessing based upon it's growth rate.
Whether such an implementation would be simplistic or terribly difficult requires a greater understanding of the underlying architecture than I have.
There would be a need to write logic to merge the separate result sets.
For example? I'm thinking simple, just the articles themselves... what needs merging other than reports and change lists and things like that?
By the way, has any bench marking/profiling been done to determine which functions are called most often, eat up the most server resources, etc.?
The temporary DB server is a very small machine with something like 1Gb ram. Geoffrey (with 4Gb) was fast before it went down, a second server is ordered that can do reads in the new setup. I'd also expect the DB to be the weakest spot in the new setup, but at a completely different level. The number of DB connections should drop by 50% or so (even more with memcached), the amount of ram will be sufficient to hold most of the DB in ram, with the option to top it up to 16Gb on each machine. -- Gabriel Wicke
Gabriel, it would seem that splitting up the database using vertical partitioning (I think that's the right direction) across the alphabet based upon the letter(s) the articles start with is one way of scaling. It is clearly not the only way...
-Kelly
wikitech-l@lists.wikimedia.org