Many thanks to everyone who has replied to my initial request for wikipedia database dumps. I am glad that I kicked up such a fuss. Anyhow, I should also state that I have access to various LAN connections (each with a speed of 3.477Mbps, me thinks).
There was some remark about using wiki servers as torrent seeds. If someone has access to various such 3.477Mbps connections – would it be possible to use them in parallel to download different parts of the same database by keeping in mind memory addresses, etc? Many thanks.
On 4/19/09, Brian Brian.Mingus@colorado.edu wrote:
I hope you wrote it for your own benefit and not mine! Traffic congestion issues being obvious enough, your reductio is irrelevant to the case of a single user who has issues saturating their relatively slow dsl link. Torrent is not an option, aget is, end of story.
On Sat, Apr 18, 2009 at 1:52 PM, Gregory Maxwell gmaxwell@gmail.com wrote:
On Fri, Apr 17, 2009 at 9:42 PM, Gregory Maxwell gmaxwell@gmail.com wrote: [snip]
But if you are running parallel connections to avoid slowdowns you're just attempting to cheat TCP congestion control and get an unfair share of the available bandwidth. That kind of selfish behaviour fuels non-neutral behaviour and ought not be encouraged.
[snip] On Sat, Apr 18, 2009 at 3:06 AM, Brian Brian.Mingus@colorado.edu wrote:
I have no problem helping someone get a faster download speech and I'm
also
not willing to fling around fallacies about how selfish behavior is bad
for
society. Here is wget vs. aget for the full history dump of the simple
[snip]
And? I did point out this is possible, and that no torrent was required to achieve this end. Thank you for validating my point.
Since you've called my position fallacious I figure I ought to give it a reasonable defence, although we've gone off-topic.
The use of parallel TCP has allowed you an inequitable share of the available network capacity[1]. The parallel transport is fundamentally less efficient as it increases the total number of congestion drops[2]. The categorical imperative would have us not perform activities that would be harmful if everyone undertook them. At the limit: If everyone attempted to achieve an unequal share of capacity by running parallel connections the internet would suffer congestion collapse[3].
Less philosophically and more practically: the unfair usage of capacity by parallel fetching P2P tools is a primary reason for internet providers to engage in 'non-neutral' activities such as blocking or throttling this P2P traffic[4][5][6]. Ironically, a provider which treats parallel transport technologies unfairly will be providing a more fair network service and non-neutral handling of traffic is the only way to prevent an (arguably unfair) redistribution of transport towards end user heavy service providers.
(I highly recommend reading the material in [5] for a simple overview of P2P fairness and network efficiency; as well as the Briscone IETF draft in [4] for a detailed operational perspective)
Much of the public discussion on neutrality has focused on portraying service providers considering or engaging in non-neutral activities as greedy and evil. The real story is far more complicated and far less clear cut.
Where this is on-topic is that non-neutral behaviour by service providers may well make the Wikimedia Foundation's mission more costly to practice in the future. In my professional opinion I believe the best defence against this sort of outcome available to organizations like Wikimedia (and other large content houses) is the promotion of equitable transfer mechanisms which avoid unduly burdening end user providers and therefore providing an objective justification for non-neutral behaviour. To this end Wikimedia should not promote or utilize cost shifting technology (such as P2P distribution) or inherently unfair inefficient transmission (parallel TCP; or fudged server-side initial window) gratuitously.
I spent a fair amount of time producing what I believe to be a well cited reply which I believe stands well enough on its own that I should not need to post any more in support of it. I hope that you will at least put some thought into the issues I've raised here before dismissing this position. If my position is fallacious then numerous academics and professionals in the industry are guilty of falling for the same fallacies.
[1] Cho, S. 2006 Congestion Control Schemes for Single and Parallel Tcp Flows in High Bandwidth-Delay Product Networks. Doctoral Thesis. UMI Order Number: AAI3219144., Texas A & M University. [2] Padhye, J., Firoiu, V. Towsley, D. and Kurose, J., Modeling TCP throughput: a simple model and its empirical validation. ACMSIGCOMM, Sept. 1998. [3] Floyd, S., and Fall, K., Promoting the Use of End-to-End Congestion Control in the Internet, IEEE/ACM Transactions on Networking, Aug. 1999. [4] B. Briscoe, T. Moncaster, L. Burness (BT), http://tools.ietf.org/html/draft-briscoe-tsvwg-relax-fairness-01 [5] Nicholas Weaver presentation "Bulk Data P2P: Cost Shifting, not Cost Savings" (http://www.icsi.berkeley.edu/~nweaver/p2pi_shifting.ppthttp://www.icsi.berkeley.edu/%7Enweaver/p2pi_shifting.ppt); Nicholas Weaver Position Paper P2PI Workshop http://www.funchords.com/p2pi/1 p2pi-weaver.txt http://www.funchords.com/p2pi/1%0Ap2pi-weaver.txt [6] Bruno Tuffin, Patrick Maillé: How Many Parallel TCP Sessions to Open: A Pricing Perspective. ICQT 2006: 2-12
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l