On 30.11.2015 20:47, Ryan Lane wrote:
On Sun, Nov 29, 2015 at 8:18 PM, Yeongjin Jang
<yeongjinjanggrad(a)gmail.com>
wrote:
I recall that I saw financial statement of WMF that states around
$2.3M
was spent for Internet Hosting. I am not sure whether it includes
management cost for computing resources
(server clusters such as eqiad) or not.
That's the cost for datacenters, hardware, bandwidth, etc..
Not sure following simple calculation works;
117 TB per day, for 365 days, if $0.05 per GB, then it is around
$2.2M.
Maybe it would be more accurate if I contact analytics team
directly.
That calculation doesn't work because it doesn't take into account
peering
agreements, or donated (or heavily discounted) transit contracts.
Bandwidth
is one of the cheaper overall costs.
Something your design doesn't take into account for bandwidth costs
is that
the world is trending to mobile and mobile bandwidth costs are
generally
very high. It's likely this p2p approach will be many orders of
magnitude
more expensive than the current approach.
A decentralized approach doesn't benefit from the economics of scale.
Instead of being able to negotiate transit pricing and eliminating
cost
through peering, you're externalizing the cost at the consumer rate,
which
is the highest possible rate.
While that is often true, there are notable exception, growing both in
scale
and number.
a) We have campus situations where a large university, company, or
public
agency with tens or hundreds of thousands of peers run a network that
they
pay for anyways, that is needed for peers to connect to Wiki* anyways,
and
that is available to peers at no (additional) cost. While external
traffic cost
are of relatively little concern, quick response times often are,
especially
in classroom situations where up to several hundred students may look
at
the same articles virtually at once.
b) There is a fast growing international movement for open wireless
radio
networks (freifunk) of comparatively low bandwidth in neighborhoods.
Those
can benefit a lot from local peering, imho.
Purodha
The other
scalability concern would be for obscure articles. I
havent
really looked at your code, so maybe you cover it - but wikipedia
has over
5 million articles (and a lot more when you count non-content
pages). The
group of peers is presumably going to have high churn (since they go
away
when you browse somewhere else). Id worry the overhead of keeping
track of
which peer knows what, especially given how fast the peers change to
be a
lot. I also expect that for lots of articles, only a very small
number of
peers will know them.
That's true. Dynamically registering /
un-registering lookup table
gives
high
overhead on the servers (in both computation & memory usage).
Distributed solutions like DHT is there, but we think there could be
a
trade-off
on lookup time for using de-centralized (managing lookup table on
the
server)
versus fully distributed architecture (DHT).
Our prior "naive" implementation costs like if each user has
5K pages cached (with around 50K images),
when 10K concurrent user presents it consumes around 35GB
of memory, and each registering incurs 500K bytes of network
traffic.
We thought it is not that useful, so now we are trying to come up
with
more lightweight implementation. We hope to have a practically
meaningful
micro-benchmark result on the new implementation.
Just the metadata for articles, images and revisions is going to be
massive. That data itself will need to be distributed too. The
network
costs associated with just lookups is going to be quite expensive for
peers.
It seems your project assumes that bandwidth is unlimited and
unrated,
which for many parts of the world isn't true.
I don't mean to dissuade you. The idea of a p2p Wikipedia is an
interesting
project, and at some point in the future if bandwidth is free and
unrated
everywhere this may be a reasonable way to provide a method of access
in
case of major disaster of Wikipedia itself. This idea has been
brought up
numerous times in the past, though, and in general the potential
gains are
never better than the latency, cost, and complexity associated with
it.
- Ryan
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l