On Tue, Sep 13, 2005 at 02:07:50PM +0200, Lars Aronsson wrote:
Tomasz Wegrzanowski wrote:
dictd servers [...] The computational costs and traffic overhead are low (but unlike http no free gzip-on-fly) - it's a really just a very simple protocol with clients only reading data from servers.
Judging from the Linux dictd manpage (*), the current implementation seems to spawn one server process for each connecting client, non-threaded, which can then stay connected as long as it sees fit. Without having tried it, this kind of solution appears to scale to a few hundred simultaneous users (workgroup or intranet level), but not to the global web level. Old webserver programmers will smile and remember how webservers used to work in 1995, based on late W. Richard Stevens' book.
I think it could be interesting (sometime in the future) to deploy a global central spelling dictionary server, where users can update the dictionary in real time (results showing up in Wiktionary), but the dictd server software would probably have to be completely rewritten for that level of scalability. It might make sense to replace the dict protocol and its need for special servers with some kind of XML over HTTP, if only to benefit from the scalability already designed into Apache.
We're talking Unix here. Per-process overhead is just a few kBs, tcp/ip buffers are bigger than that.
The real point is of course the ease of replication. Just put 20 machines, with a dictd on each of them, and a dumb load balancer in front of them, and you can practically serve 20x more connections.
I don't think any dictd server actually serves thousands of simultaneous connections, so nobody cared to write a more efficient server. But the protocol is so simple that it shouldn't be any difficult, if it was actually needed. http://www.dict.org/links.html lists servers written in C, Perl, Java and Python.
The Perl server (jiten) has (not counting addons/ directory) about 500 LOC and is fork-less (it uses IO::Select).