At 08:25 PM 3/17/2004, you wrote:
On Mar 17, 2004, at 15:58, Kelly Anderson wrote:
Gnutella is decentralized; it was designed in response to the classic
Napster, which could be and was shut down by suing the server runners into
oblivion. Search requests are passed from node to node to node in a very
inefficient fashion, but once a file is out there, the original seeder
need not remain on the network. Nodes generally provide many files for
peering (often all they have ever downloaded). The system was designed for
smallish files (up to several megabytes).
Thank you for that clear explanation Brian. I think I understand better
now. Too bad that the search couldn't be somewhat more efficient... but I
get the model now.
BitTorrent is very centralized; it was designed for
legitimate
distribution of large files to many simultaneous downloaders, using
peer-to-peer transfers simply as a way to save bandwidth for the central
server. There is no searching mechanism; you must connect to the
particular tracker server managing the torrent for the file you want and
ask for it specifically. If the tracker goes offline, everything fails.
Nodes only make available for peered access the files they are in progress
downloading or have very recently downloaded (and not yet closed the
window on). The system was designed for large files (tens, hundreds, or
thousands of megabytes), and fetches pieces of a file from multiple
different peers simultaneously if possible.
While it is true that BitTorrent does not have a search facility, I believe
it is relatively easy to duplicate the torrent file so that it becomes
decentralized. In fact, I believe it to be the case that once you have an
entire copy of the file in question, you become a new seeder, although I'm
unclear if that automatically gives you a new distributable version of the
torrent file. I think it doesn't as you say.
Don't know anything about Ares.
It's a little commercial program more similar to Gnutella than BitTorrent,
as per the discussion here.
Would someone
who is familiar with both Gnutella and Bittorrent tell me
why using Bittorrent for such a project would be stupid? It would
certainly use less of Wikipedia's already strained bandwidth.
The overhead of torrenting individual articles or PDF booklets on
particular subjects from Wikipedia would likely far outweigh simple HTTP,
particularly since it's generally unlikely that many people would be
downloading the same (small) file (out of many thousands available)
simultaneously.
Agreed, we were talking though about distributing the entire database, or
large portions of it (like all the images). For this, BitTorrent would be
perfect. The centralized nature isn't a problem since there is only really
one source of the information.
Hypothetically it might be useful for distributing
large bulk dumps (such
as the current database dumps), if and only if more than one person at a
time is likely to be downloading them.
Precisely. Although typically with BitTorrent people leave the connection
open to upload at least as much as they download, even if that may take
much longer. It's a feature of at least some of the BitTorrent clients. All
in all, BitTorrent is IMHO a very cool file distribution mechanism.
Bandwidth really isn't a problem for Wikipedia; we
use a fair amount of it
(compared to Joe Bob's homepage, not compared to Yahoo) but it's not
"strained".
I guess I was using "bandwidth" in a more generic sense, including CPU time
and so forth. From the response times I normally get from Wikipedia,
something is generally straining... (I have a broadband connection, and it
happens relatively consistently, so I'm relatively sure the issue is mostly
with the server.) This is not to disrespect Wikipedia or WikiMedia, just
that in my personal experience it is just not as responsive as say Ebay or
Amazon or similar sites. I must say that I'm somewhat surprised at this
with all the hardware that has obviously been thrown at the problem of
late. Perhaps the issue is just PHP... I don't know. Wikis that I've set up
(which typically only have a dozen users admittedly) seem to have good
response times, so I don't think it's the code per se, but rather the load.
In any case, my thought was that BitTorrent would be a good way to
distribute large Wikipedia files without impacting any of the existing
servers in a substantial fashion.
-Kelly