Two separate sites indicate potential sources of torrents for *.tar.gz downloads of the en wikipedia database material :
http://en.wikipedia.org/wiki/Wikipedia_database and http://meta.wikimedia.org/wiki/Data_dumps#What_about_bittorrent.3F (so far).
Is it possible for anyone to indicate more comprehensive lists of torrents/trackers than these? Are there any plans for all the database download files to be available in this way (I imagine that there would also be some PDF manual which would go along with these to indicate offline viewing, and potentially more info than this). J
On 4/15/09, Petr Kadlec petr.kadlec@gmail.com wrote:
2009/4/14 Platonides Platonides@gmail.com:
IMHO the benefits of separated files are similar to the disadvantages. A side side benefit if it would be that hashes would be splitted, too. If you were unlucky, knowing that 'something' (perhaps just a bit) on the 150GB you downloaded is wrong, is not that helpful. So having hashes for file sections on the big ones, even if not 'standard' would be an improvement.
For that, something like Parchive would probably be better…
-- [[cs:User:Mormegil | Petr Kadlec]]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Apr 17, 2009 at 5:55 PM, Jameson Scanlon jameson.scanlon@googlemail.com wrote:
Two separate sites indicate potential sources of torrents for *.tar.gz downloads of the en wikipedia database material :
http://en.wikipedia.org/wiki/Wikipedia_database and http://meta.wikimedia.org/wiki/Data_dumps#What_about_bittorrent.3F (so far).
Is it possible for anyone to indicate more comprehensive lists of torrents/trackers than these? Are there any plans for all the database download files to be available in this way (I imagine that there would also be some PDF manual which would go along with these to indicate offline viewing, and potentially more info than this). J
On 4/15/09, Petr Kadlec petr.kadlec@gmail.com wrote:
2009/4/14 Platonides Platonides@gmail.com:
IMHO the benefits of separated files are similar to the disadvantages. A side side benefit if it would be that hashes would be splitted, too. If you were unlucky, knowing that 'something' (perhaps just a bit) on the 150GB you downloaded is wrong, is not that helpful. So having hashes for file sections on the big ones, even if not 'standard' would be an improvement.
For that, something like Parchive would probably be better…
-- [[cs:User:Mormegil | Petr Kadlec]]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I seem to remember there being a discussion about the torrenting issue before. In short: there's never been any official torrents, and the unofficial ones never got really popular.
-Chad
On Fri, Apr 17, 2009 at 6:10 PM, Chad innocentkiller@gmail.com wrote:
I seem to remember there being a discussion about the torrenting issue before. In short: there's never been any official torrents, and the unofficial ones never got really popular.
Torrent isn't a very good transfer method for things which are not fairly popular as it has a fair amount of overhead.
The wikimedia download site should be able to saturate your internet connection in any case…
2009/4/17 Gregory Maxwell gmaxwell@gmail.com:
Torrent isn't a very good transfer method for things which are not fairly popular as it has a fair amount of overhead. The wikimedia download site should be able to saturate your internet connection in any case…
Indeed :-) The problem with the dumps as I understand it is not serving them - if that was a problem, you can be sure the Internet Archive would be happy to store Wikimedia dumps forever - but generating them in the first place.
- d.
On Fri, Apr 17, 2009 at 7:39 PM, Gregory Maxwell gmaxwell@gmail.com wrote:
Torrent isn't a very good transfer method for things which are not fairly popular as it has a fair amount of overhead.
The wikimedia download site should be able to saturate your internet connection in any case…
But some ISP's throttle TCP-connections (either by design or by simple oversubscription and random packet drops), so many small connections *can* yield a better result for the end user. And if you are so unlucky as to having a crappy connection from your country to the download-site, maybe, just maybe someone in your own country already has downloaded it and is willing to share the torrent... :)
I can saturate my little 1M ADSL-link with torrent-downloads, but forget about getting throughput when it comes to HTTP-requests... if it's in the country, in close proximity and the server is willing, then *maybe*.. but else.. no way.
Not everyone is very well connected, unfortunately...
/Stigmj
On Fri, Apr 17, 2009 at 9:21 PM, Stig Meireles Johansen stigmj@gmail.com wrote:
But some ISP's throttle TCP-connections (either by design or by simple oversubscription and random packet drops), so many small connections *can* yield a better result for the end user. And if you are so unlucky as to having a crappy connection from your country to the download-site, maybe, just maybe someone in your own country already has downloaded it and is willing to share the torrent... :) I can saturate my little 1M ADSL-link with torrent-downloads, but forget about getting throughput when it comes to HTTP-requests... if it's in the country, in close proximity and the server is willing, then *maybe*.. but else.. no way.
There are plenty of downloading tools that will use range requests to download a signal file with parallel connections…
But if you are running parallel connections to avoid slowdowns you're just attempting to cheat TCP congestion control and get an unfair share of the available bandwidth. That kind of selfish behaviour fuels non-neutral behaviour and ought not be encouraged.
We offered torrents in the past for commons picture of the year results— a more popular thing to download, a much smaller file (~500mb vs many gbytes), and not something which should become outdated every month… and pretty much no one stayed connected long enough for anyone else to manage to pull anything from them. It was an interesting experiment, but it indicated that further use for these sorts of files would be a waste of time.
I have no problem helping someone get a faster download speech and I'm also not willing to fling around fallacies about how selfish behavior is bad for society. Here is wget vs. aget for the full history dump of the simple english wikipedia - a substantial 3.5x improvement that someone who is having slow connection issues definitely should consider trying.
time wget -O/dev/null ' http://download.wikimedia.org/simplewiki/20090330/simplewiki-20090330-pages-... ' --2009-04-18 00:59:48-- http://download.wikimedia.org/simplewiki/20090330/simplewiki-20090330-pages-...
Resolving download.wikimedia.org... 208.80.152.183
Connecting to download.wikimedia.org|208.80.152.183|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 125918415 (120M) [application/x-7z-compressed]
Saving to: `/dev/null'
100%[========================================================================================================================================================================================>] 125,918,415 1.41M/s in 73s
2009-04-18 01:01:01 (1.66 MB/s) - `/dev/null' saved [125918415/125918415]
real 1m13.156s user 0m0.216s sys 0m0.964s
mingus@dream:/home/mingus/ccnlab_bib -> time aget -n20 -f ' http://download.wikimedia.org/simplewiki/20090330/simplewiki-20090330-pages-... ' <LOG> Attempting to read log file aget-simplewiki-20090330-pages-meta-history.xml.7z.log for resuming download job... <LOG> Couldn't find log file for this download, starting a clean job...
<LOG> Head-Request Connection established
<LOG> Downloading /simplewiki/20090330/simplewiki-20090330-pages-meta-history.xml.7z (125918415 bytes) from site download.wikimedia.org(208.80.152.183:80). Number of Threads: 20 .. .. [4% completed]
[4% completed]
..... [9% completed]
....... [14% completed]
.......... [19% completed]
............ [24% completed]
............... [29% completed]
................. [34% completed]
.................... [39% completed]
...................... [44% completed]
......................... [49% completed]
........................... [54% completed]
.............................. [59% completed]
................................ [64% completed]
................................... [69% completed]
..................................... [74% completed]
........................................ [79% completed]
.......................................... [84% completed]
............................................. [89% completed] ............................................... [94% completed] .................................................. [99% completed] .................................................. [100% completed] <LOG> Download completed, job completed in 21 seconds. (5855 Kb/sec) <LOG> Shutting down...
real 0m20.985s user 0m0.116s sys 0m1.412s
s/speech/speed/ :-0
On Sat, Apr 18, 2009 at 1:06 AM, Brian Brian.Mingus@colorado.edu wrote:
I have no problem helping someone get a faster download speech and I'm also not willing to fling around fallacies about how selfish behavior is bad for society. Here is wget vs. aget for the full history dump of the simple english wikipedia - a substantial 3.5x improvement that someone who is having slow connection issues definitely should consider trying.
time wget -O/dev/null ' http://download.wikimedia.org/simplewiki/20090330/simplewiki-20090330-pages-... ' --2009-04-18 00:59:48-- http://download.wikimedia.org/simplewiki/20090330/simplewiki-20090330-pages-...
Resolving download.wikimedia.org... 208.80.152.183
Connecting to download.wikimedia.org|208.80.152.183|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 125918415 (120M) [application/x-7z-compressed]
Saving to: `/dev/null'
100%[========================================================================================================================================================================================>] 125,918,415 1.41M/s in 73s
2009-04-18 01:01:01 (1.66 MB/s) - `/dev/null' saved [125918415/125918415]
real 1m13.156s user 0m0.216s sys 0m0.964s
mingus@dream:/home/mingus/ccnlab_bib -> time aget -n20 -f ' http://download.wikimedia.org/simplewiki/20090330/simplewiki-20090330-pages-... ' <LOG> Attempting to read log file aget-simplewiki-20090330-pages-meta-history.xml.7z.log for resuming download job... <LOG> Couldn't find log file for this download, starting a clean job...
<LOG> Head-Request Connection established
<LOG> Downloading /simplewiki/20090330/simplewiki-20090330-pages-meta-history.xml.7z (125918415 bytes) from site download.wikimedia.org(208.80.152.183:80). Number of Threads: 20 .. .. [4% completed]
[4%
completed]
..... [9% completed]
....... [14% completed]
.......... [19% completed]
............ [24% completed]
............... [29% completed]
................. [34% completed]
.................... [39% completed]
...................... [44% completed]
......................... [49% completed]
........................... [54% completed]
.............................. [59% completed]
................................ [64% completed]
................................... [69% completed]
..................................... [74% completed]
........................................ [79% completed]
.......................................... [84% completed]
............................................. [89% completed] ............................................... [94% completed] .................................................. [99% completed] .................................................. [100% completed] <LOG> Download completed, job completed in 21 seconds. (5855 Kb/sec) <LOG> Shutting down...
real 0m20.985s user 0m0.116s sys 0m1.412s
On Fri, Apr 17, 2009 at 9:42 PM, Gregory Maxwell gmaxwell@gmail.com wrote: [snip]
But if you are running parallel connections to avoid slowdowns you're just attempting to cheat TCP congestion control and get an unfair share of the available bandwidth. That kind of selfish behaviour fuels non-neutral behaviour and ought not be encouraged.
[snip] On Sat, Apr 18, 2009 at 3:06 AM, Brian Brian.Mingus@colorado.edu wrote:
I have no problem helping someone get a faster download speech and I'm also not willing to fling around fallacies about how selfish behavior is bad for society. Here is wget vs. aget for the full history dump of the simple
[snip]
And? I did point out this is possible, and that no torrent was required to achieve this end. Thank you for validating my point.
Since you've called my position fallacious I figure I ought to give it a reasonable defence, although we've gone off-topic.
The use of parallel TCP has allowed you an inequitable share of the available network capacity[1]. The parallel transport is fundamentally less efficient as it increases the total number of congestion drops[2]. The categorical imperative would have us not perform activities that would be harmful if everyone undertook them. At the limit: If everyone attempted to achieve an unequal share of capacity by running parallel connections the internet would suffer congestion collapse[3].
Less philosophically and more practically: the unfair usage of capacity by parallel fetching P2P tools is a primary reason for internet providers to engage in 'non-neutral' activities such as blocking or throttling this P2P traffic[4][5][6]. Ironically, a provider which treats parallel transport technologies unfairly will be providing a more fair network service and non-neutral handling of traffic is the only way to prevent an (arguably unfair) redistribution of transport towards end user heavy service providers.
(I highly recommend reading the material in [5] for a simple overview of P2P fairness and network efficiency; as well as the Briscone IETF draft in [4] for a detailed operational perspective)
Much of the public discussion on neutrality has focused on portraying service providers considering or engaging in non-neutral activities as greedy and evil. The real story is far more complicated and far less clear cut.
Where this is on-topic is that non-neutral behaviour by service providers may well make the Wikimedia Foundation's mission more costly to practice in the future. In my professional opinion I believe the best defence against this sort of outcome available to organizations like Wikimedia (and other large content houses) is the promotion of equitable transfer mechanisms which avoid unduly burdening end user providers and therefore providing an objective justification for non-neutral behaviour. To this end Wikimedia should not promote or utilize cost shifting technology (such as P2P distribution) or inherently unfair inefficient transmission (parallel TCP; or fudged server-side initial window) gratuitously.
I spent a fair amount of time producing what I believe to be a well cited reply which I believe stands well enough on its own that I should not need to post any more in support of it. I hope that you will at least put some thought into the issues I've raised here before dismissing this position. If my position is fallacious then numerous academics and professionals in the industry are guilty of falling for the same fallacies.
[1] Cho, S. 2006 Congestion Control Schemes for Single and Parallel Tcp Flows in High Bandwidth-Delay Product Networks. Doctoral Thesis. UMI Order Number: AAI3219144., Texas A & M University. [2] Padhye, J., Firoiu, V. Towsley, D. and Kurose, J., Modeling TCP throughput: a simple model and its empirical validation. ACMSIGCOMM, Sept. 1998. [3] Floyd, S., and Fall, K., Promoting the Use of End-to-End Congestion Control in the Internet, IEEE/ACM Transactions on Networking, Aug. 1999. [4] B. Briscoe, T. Moncaster, L. Burness (BT), http://tools.ietf.org/html/draft-briscoe-tsvwg-relax-fairness-01 [5] Nicholas Weaver presentation "Bulk Data P2P: Cost Shifting, not Cost Savings" (http://www.icsi.berkeley.edu/~nweaver/p2pi_shifting.ppt); Nicholas Weaver Position Paper P2PI Workshop http://www.funchords.com/p2pi/1 p2pi-weaver.txt [6] Bruno Tuffin, Patrick Maillé: How Many Parallel TCP Sessions to Open: A Pricing Perspective. ICQT 2006: 2-12
Small comment:
If finnaly some guy decide to mirror the files using torrent, I suggest the use of one of these "Channels" (RSS?) that Azureus support. This client even support the "autodownload" of these channels.
I hope you wrote it for your own benefit and not mine! Traffic congestion issues being obvious enough, your reductio is irrelevant to the case of a single user who has issues saturating their relatively slow dsl link. Torrent is not an option, aget is, end of story.
On Sat, Apr 18, 2009 at 1:52 PM, Gregory Maxwell gmaxwell@gmail.com wrote:
On Fri, Apr 17, 2009 at 9:42 PM, Gregory Maxwell gmaxwell@gmail.com wrote: [snip]
But if you are running parallel connections to avoid slowdowns you're just attempting to cheat TCP congestion control and get an unfair share of the available bandwidth. That kind of selfish behaviour fuels non-neutral behaviour and ought not be encouraged.
[snip] On Sat, Apr 18, 2009 at 3:06 AM, Brian Brian.Mingus@colorado.edu wrote:
I have no problem helping someone get a faster download speech and I'm
also
not willing to fling around fallacies about how selfish behavior is bad
for
society. Here is wget vs. aget for the full history dump of the simple
[snip]
And? I did point out this is possible, and that no torrent was required to achieve this end. Thank you for validating my point.
Since you've called my position fallacious I figure I ought to give it a reasonable defence, although we've gone off-topic.
The use of parallel TCP has allowed you an inequitable share of the available network capacity[1]. The parallel transport is fundamentally less efficient as it increases the total number of congestion drops[2]. The categorical imperative would have us not perform activities that would be harmful if everyone undertook them. At the limit: If everyone attempted to achieve an unequal share of capacity by running parallel connections the internet would suffer congestion collapse[3].
Less philosophically and more practically: the unfair usage of capacity by parallel fetching P2P tools is a primary reason for internet providers to engage in 'non-neutral' activities such as blocking or throttling this P2P traffic[4][5][6]. Ironically, a provider which treats parallel transport technologies unfairly will be providing a more fair network service and non-neutral handling of traffic is the only way to prevent an (arguably unfair) redistribution of transport towards end user heavy service providers.
(I highly recommend reading the material in [5] for a simple overview of P2P fairness and network efficiency; as well as the Briscone IETF draft in [4] for a detailed operational perspective)
Much of the public discussion on neutrality has focused on portraying service providers considering or engaging in non-neutral activities as greedy and evil. The real story is far more complicated and far less clear cut.
Where this is on-topic is that non-neutral behaviour by service providers may well make the Wikimedia Foundation's mission more costly to practice in the future. In my professional opinion I believe the best defence against this sort of outcome available to organizations like Wikimedia (and other large content houses) is the promotion of equitable transfer mechanisms which avoid unduly burdening end user providers and therefore providing an objective justification for non-neutral behaviour. To this end Wikimedia should not promote or utilize cost shifting technology (such as P2P distribution) or inherently unfair inefficient transmission (parallel TCP; or fudged server-side initial window) gratuitously.
I spent a fair amount of time producing what I believe to be a well cited reply which I believe stands well enough on its own that I should not need to post any more in support of it. I hope that you will at least put some thought into the issues I've raised here before dismissing this position. If my position is fallacious then numerous academics and professionals in the industry are guilty of falling for the same fallacies.
[1] Cho, S. 2006 Congestion Control Schemes for Single and Parallel Tcp Flows in High Bandwidth-Delay Product Networks. Doctoral Thesis. UMI Order Number: AAI3219144., Texas A & M University. [2] Padhye, J., Firoiu, V. Towsley, D. and Kurose, J., Modeling TCP throughput: a simple model and its empirical validation. ACMSIGCOMM, Sept. 1998. [3] Floyd, S., and Fall, K., Promoting the Use of End-to-End Congestion Control in the Internet, IEEE/ACM Transactions on Networking, Aug. 1999. [4] B. Briscoe, T. Moncaster, L. Burness (BT), http://tools.ietf.org/html/draft-briscoe-tsvwg-relax-fairness-01 [5] Nicholas Weaver presentation "Bulk Data P2P: Cost Shifting, not Cost Savings" (http://www.icsi.berkeley.edu/~nweaver/p2pi_shifting.ppthttp://www.icsi.berkeley.edu/%7Enweaver/p2pi_shifting.ppt); Nicholas Weaver Position Paper P2PI Workshop http://www.funchords.com/p2pi/1 p2pi-weaver.txt http://www.funchords.com/p2pi/1%0Ap2pi-weaver.txt [6] Bruno Tuffin, Patrick Maillé: How Many Parallel TCP Sessions to Open: A Pricing Perspective. ICQT 2006: 2-12
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Many thanks to everyone who has replied to my initial request for wikipedia database dumps. I am glad that I kicked up such a fuss. Anyhow, I should also state that I have access to various LAN connections (each with a speed of 3.477Mbps, me thinks).
There was some remark about using wiki servers as torrent seeds. If someone has access to various such 3.477Mbps connections – would it be possible to use them in parallel to download different parts of the same database by keeping in mind memory addresses, etc? Many thanks.
On 4/19/09, Brian Brian.Mingus@colorado.edu wrote:
I hope you wrote it for your own benefit and not mine! Traffic congestion issues being obvious enough, your reductio is irrelevant to the case of a single user who has issues saturating their relatively slow dsl link. Torrent is not an option, aget is, end of story.
On Sat, Apr 18, 2009 at 1:52 PM, Gregory Maxwell gmaxwell@gmail.com wrote:
On Fri, Apr 17, 2009 at 9:42 PM, Gregory Maxwell gmaxwell@gmail.com wrote: [snip]
But if you are running parallel connections to avoid slowdowns you're just attempting to cheat TCP congestion control and get an unfair share of the available bandwidth. That kind of selfish behaviour fuels non-neutral behaviour and ought not be encouraged.
[snip] On Sat, Apr 18, 2009 at 3:06 AM, Brian Brian.Mingus@colorado.edu wrote:
I have no problem helping someone get a faster download speech and I'm
also
not willing to fling around fallacies about how selfish behavior is bad
for
society. Here is wget vs. aget for the full history dump of the simple
[snip]
And? I did point out this is possible, and that no torrent was required to achieve this end. Thank you for validating my point.
Since you've called my position fallacious I figure I ought to give it a reasonable defence, although we've gone off-topic.
The use of parallel TCP has allowed you an inequitable share of the available network capacity[1]. The parallel transport is fundamentally less efficient as it increases the total number of congestion drops[2]. The categorical imperative would have us not perform activities that would be harmful if everyone undertook them. At the limit: If everyone attempted to achieve an unequal share of capacity by running parallel connections the internet would suffer congestion collapse[3].
Less philosophically and more practically: the unfair usage of capacity by parallel fetching P2P tools is a primary reason for internet providers to engage in 'non-neutral' activities such as blocking or throttling this P2P traffic[4][5][6]. Ironically, a provider which treats parallel transport technologies unfairly will be providing a more fair network service and non-neutral handling of traffic is the only way to prevent an (arguably unfair) redistribution of transport towards end user heavy service providers.
(I highly recommend reading the material in [5] for a simple overview of P2P fairness and network efficiency; as well as the Briscone IETF draft in [4] for a detailed operational perspective)
Much of the public discussion on neutrality has focused on portraying service providers considering or engaging in non-neutral activities as greedy and evil. The real story is far more complicated and far less clear cut.
Where this is on-topic is that non-neutral behaviour by service providers may well make the Wikimedia Foundation's mission more costly to practice in the future. In my professional opinion I believe the best defence against this sort of outcome available to organizations like Wikimedia (and other large content houses) is the promotion of equitable transfer mechanisms which avoid unduly burdening end user providers and therefore providing an objective justification for non-neutral behaviour. To this end Wikimedia should not promote or utilize cost shifting technology (such as P2P distribution) or inherently unfair inefficient transmission (parallel TCP; or fudged server-side initial window) gratuitously.
I spent a fair amount of time producing what I believe to be a well cited reply which I believe stands well enough on its own that I should not need to post any more in support of it. I hope that you will at least put some thought into the issues I've raised here before dismissing this position. If my position is fallacious then numerous academics and professionals in the industry are guilty of falling for the same fallacies.
[1] Cho, S. 2006 Congestion Control Schemes for Single and Parallel Tcp Flows in High Bandwidth-Delay Product Networks. Doctoral Thesis. UMI Order Number: AAI3219144., Texas A & M University. [2] Padhye, J., Firoiu, V. Towsley, D. and Kurose, J., Modeling TCP throughput: a simple model and its empirical validation. ACMSIGCOMM, Sept. 1998. [3] Floyd, S., and Fall, K., Promoting the Use of End-to-End Congestion Control in the Internet, IEEE/ACM Transactions on Networking, Aug. 1999. [4] B. Briscoe, T. Moncaster, L. Burness (BT), http://tools.ietf.org/html/draft-briscoe-tsvwg-relax-fairness-01 [5] Nicholas Weaver presentation "Bulk Data P2P: Cost Shifting, not Cost Savings" (http://www.icsi.berkeley.edu/~nweaver/p2pi_shifting.ppthttp://www.icsi.berkeley.edu/%7Enweaver/p2pi_shifting.ppt); Nicholas Weaver Position Paper P2PI Workshop http://www.funchords.com/p2pi/1 p2pi-weaver.txt http://www.funchords.com/p2pi/1%0Ap2pi-weaver.txt [6] Bruno Tuffin, Patrick Maillé: How Many Parallel TCP Sessions to Open: A Pricing Perspective. ICQT 2006: 2-12
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
such a detailed reply which must have taken a lot of time for a mediawiki developer, with little use.
btw, this does show that net neutrality is not at all clear cut. bittorrent gives a lot of options for abuse of the network -- many leechers don't seed, too many connections max out the hardware, etc.
On 2009-Apr-19, at 01:22, Gregory Maxwell wrote:
On Fri, Apr 17, 2009 at 9:42 PM, Gregory Maxwell gmaxwell@gmail.com wrote: [snip]
But if you are running parallel connections to avoid slowdowns you're just attempting to cheat TCP congestion control and get an unfair share of the available bandwidth. That kind of selfish behaviour fuels non-neutral behaviour and ought not be encouraged.
[snip] On Sat, Apr 18, 2009 at 3:06 AM, Brian Brian.Mingus@colorado.edu wrote:
I have no problem helping someone get a faster download speech and I'm also not willing to fling around fallacies about how selfish behavior is bad for society. Here is wget vs. aget for the full history dump of the simple
[snip]
And? I did point out this is possible, and that no torrent was required to achieve this end. Thank you for validating my point.
Since you've called my position fallacious I figure I ought to give it a reasonable defence, although we've gone off-topic.
The use of parallel TCP has allowed you an inequitable share of the available network capacity[1]. The parallel transport is fundamentally less efficient as it increases the total number of congestion drops[2]. The categorical imperative would have us not perform activities that would be harmful if everyone undertook them. At the limit: If everyone attempted to achieve an unequal share of capacity by running parallel connections the internet would suffer congestion collapse[3].
Less philosophically and more practically: the unfair usage of capacity by parallel fetching P2P tools is a primary reason for internet providers to engage in 'non-neutral' activities such as blocking or throttling this P2P traffic[4][5][6]. Ironically, a provider which treats parallel transport technologies unfairly will be providing a more fair network service and non-neutral handling of traffic is the only way to prevent an (arguably unfair) redistribution of transport towards end user heavy service providers.
(I highly recommend reading the material in [5] for a simple overview of P2P fairness and network efficiency; as well as the Briscone IETF draft in [4] for a detailed operational perspective)
Much of the public discussion on neutrality has focused on portraying service providers considering or engaging in non-neutral activities as greedy and evil. The real story is far more complicated and far less clear cut.
Where this is on-topic is that non-neutral behaviour by service providers may well make the Wikimedia Foundation's mission more costly to practice in the future. In my professional opinion I believe the best defence against this sort of outcome available to organizations like Wikimedia (and other large content houses) is the promotion of equitable transfer mechanisms which avoid unduly burdening end user providers and therefore providing an objective justification for non-neutral behaviour. To this end Wikimedia should not promote or utilize cost shifting technology (such as P2P distribution) or inherently unfair inefficient transmission (parallel TCP; or fudged server-side initial window) gratuitously.
I spent a fair amount of time producing what I believe to be a well cited reply which I believe stands well enough on its own that I should not need to post any more in support of it. I hope that you will at least put some thought into the issues I've raised here before dismissing this position. If my position is fallacious then numerous academics and professionals in the industry are guilty of falling for the same fallacies.
[1] Cho, S. 2006 Congestion Control Schemes for Single and Parallel Tcp Flows in High Bandwidth-Delay Product Networks. Doctoral Thesis. UMI Order Number: AAI3219144., Texas A & M University. [2] Padhye, J., Firoiu, V. Towsley, D. and Kurose, J., Modeling TCP throughput: a simple model and its empirical validation. ACMSIGCOMM, Sept. 1998. [3] Floyd, S., and Fall, K., Promoting the Use of End-to-End Congestion Control in the Internet, IEEE/ACM Transactions on Networking, Aug. 1999. [4] B. Briscoe, T. Moncaster, L. Burness (BT), http://tools.ietf.org/html/draft-briscoe-tsvwg-relax-fairness-01 [5] Nicholas Weaver presentation "Bulk Data P2P: Cost Shifting, not Cost Savings" (http://www.icsi.berkeley.edu/~nweaver/p2pi_shifting.ppt); Nicholas Weaver Position Paper P2PI Workshop http://www.funchords.com/p2pi/1 p2pi-weaver.txt [6] Bruno Tuffin, Patrick Maillé: How Many Parallel TCP Sessions to Open: A Pricing Perspective. ICQT 2006: 2-12
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
OOPS. I am awfully sorry for such a gross mistake.
I am very very sorry.
I especially apologise for saying the email was of little use, I said it jokingly. It was of much use to me, as I got parts of the other side of the story which one doesn't get from the usual news sites.
I sincerely apologise for making such statements and hope that it will be taken in the way it was intended -- a very very casual way.
On 2009-May-15, at 04:58, Vedant Lath wrote:
such a detailed reply which must have taken a lot of time for a mediawiki developer, with little use.
btw, this does show that net neutrality is not at all clear cut. bittorrent gives a lot of options for abuse of the network -- many leechers don't seed, too many connections max out the hardware, etc.
On 2009-Apr-19, at 01:22, Gregory Maxwell wrote:
On Fri, Apr 17, 2009 at 9:42 PM, Gregory Maxwell gmaxwell@gmail.com wrote: [snip]
But if you are running parallel connections to avoid slowdowns you're just attempting to cheat TCP congestion control and get an unfair share of the available bandwidth. That kind of selfish behaviour fuels non-neutral behaviour and ought not be encouraged.
[snip] On Sat, Apr 18, 2009 at 3:06 AM, Brian Brian.Mingus@colorado.edu wrote:
I have no problem helping someone get a faster download speech and I'm also not willing to fling around fallacies about how selfish behavior is bad for society. Here is wget vs. aget for the full history dump of the simple
[snip]
And? I did point out this is possible, and that no torrent was required to achieve this end. Thank you for validating my point.
Since you've called my position fallacious I figure I ought to give it a reasonable defence, although we've gone off-topic.
The use of parallel TCP has allowed you an inequitable share of the available network capacity[1]. The parallel transport is fundamentally less efficient as it increases the total number of congestion drops[2]. The categorical imperative would have us not perform activities that would be harmful if everyone undertook them. At the limit: If everyone attempted to achieve an unequal share of capacity by running parallel connections the internet would suffer congestion collapse[3].
Less philosophically and more practically: the unfair usage of capacity by parallel fetching P2P tools is a primary reason for internet providers to engage in 'non-neutral' activities such as blocking or throttling this P2P traffic[4][5][6]. Ironically, a provider which treats parallel transport technologies unfairly will be providing a more fair network service and non-neutral handling of traffic is the only way to prevent an (arguably unfair) redistribution of transport towards end user heavy service providers.
(I highly recommend reading the material in [5] for a simple overview of P2P fairness and network efficiency; as well as the Briscone IETF draft in [4] for a detailed operational perspective)
Much of the public discussion on neutrality has focused on portraying service providers considering or engaging in non-neutral activities as greedy and evil. The real story is far more complicated and far less clear cut.
Where this is on-topic is that non-neutral behaviour by service providers may well make the Wikimedia Foundation's mission more costly to practice in the future. In my professional opinion I believe the best defence against this sort of outcome available to organizations like Wikimedia (and other large content houses) is the promotion of equitable transfer mechanisms which avoid unduly burdening end user providers and therefore providing an objective justification for non-neutral behaviour. To this end Wikimedia should not promote or utilize cost shifting technology (such as P2P distribution) or inherently unfair inefficient transmission (parallel TCP; or fudged server-side initial window) gratuitously.
I spent a fair amount of time producing what I believe to be a well cited reply which I believe stands well enough on its own that I should not need to post any more in support of it. I hope that you will at least put some thought into the issues I've raised here before dismissing this position. If my position is fallacious then numerous academics and professionals in the industry are guilty of falling for the same fallacies.
[1] Cho, S. 2006 Congestion Control Schemes for Single and Parallel Tcp Flows in High Bandwidth-Delay Product Networks. Doctoral Thesis. UMI Order Number: AAI3219144., Texas A & M University. [2] Padhye, J., Firoiu, V. Towsley, D. and Kurose, J., Modeling TCP throughput: a simple model and its empirical validation. ACMSIGCOMM, Sept. 1998. [3] Floyd, S., and Fall, K., Promoting the Use of End-to-End Congestion Control in the Internet, IEEE/ACM Transactions on Networking, Aug. 1999. [4] B. Briscoe, T. Moncaster, L. Burness (BT), http://tools.ietf.org/html/draft-briscoe-tsvwg-relax-fairness-01 [5] Nicholas Weaver presentation "Bulk Data P2P: Cost Shifting, not Cost Savings" (http://www.icsi.berkeley.edu/~nweaver/p2pi_shifting.ppt); Nicholas Weaver Position Paper P2PI Workshop http://www.funchords.com/p2pi/1 p2pi-weaver.txt [6] Bruno Tuffin, Patrick Maillé: How Many Parallel TCP Sessions to Open: A Pricing Perspective. ICQT 2006: 2-12
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
interesting case. happens here too. the best way to determine max speed of a connection is to use bittorrent for one of the most popular torrents for some time and look at the speed.
On 2009-Apr-18, at 06:51, Stig Meireles Johansen wrote:
On Fri, Apr 17, 2009 at 7:39 PM, Gregory Maxwell gmaxwell@gmail.com wrote:
Torrent isn't a very good transfer method for things which are not fairly popular as it has a fair amount of overhead.
The wikimedia download site should be able to saturate your internet connection in any case…
But some ISP's throttle TCP-connections (either by design or by simple oversubscription and random packet drops), so many small connections *can* yield a better result for the end user. And if you are so unlucky as to having a crappy connection from your country to the download-site, maybe, just maybe someone in your own country already has downloaded it and is willing to share the torrent... :)
I can saturate my little 1M ADSL-link with torrent-downloads, but forget about getting throughput when it comes to HTTP-requests... if it's in the country, in close proximity and the server is willing, then *maybe*.. but else.. no way.
Not everyone is very well connected, unfortunately...
/Stigmj _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Apr 17, 2009 at 11:55 PM, Jameson Scanlon < jameson.scanlon@googlemail.com> wrote:
Is it possible for anyone to indicate more comprehensive lists of torrents/trackers than these? Are there any plans for all the database download files to be available in this way (I imagine that there would also be some PDF manual which would go along with these to indicate offline viewing, and potentially more info than this).
In theory, one can create a torrent with the Wikipedia servers as webseeds easily. Question is, how many torrent clients except Azureus support these?
Marco
wikitech-l@lists.wikimedia.org