Re: [Wikitech-l] Peer-to-peer sharing of the content of Wikipedia through WebRTC

30 Nov 2015

      This is something we really wanted to have at One Laptop Per Child, and I'm
glad you're looking into it!
In our case, we wanted to be able to provide schools a complete copy of
wikipedia, but could only afford to dedicate about 100MB/student to the
cause (at the time, I think our entire machine had 4GB of storage).  We
ended up using a highly-compressed wikipedia "slice" combining the most
popular articles with a set of articles spidered from a list of educational
topics (so you'd be sure to get articles on all the elements, all the
planets, etc).  But what we really *wanted* to do was to split up the
content among the kid's machines, as you've done, so that between them all
they could have access to a much broader slice of wikipedia.
Now a days, with lower storage costs, it's possible we could give students
a much broader slice of the *text* of wikipedia articles, and still use the
peer-to-peer approach to serve the *media* associated with the articles
(which is much larger than the text content).
The other side of this coin is supporting editability.  We were always
dissatisfied with our read-only slices of the Wiki -- true empowerment
means being able to add and edit content, not just passively consume it.
Of course, collaboratively editing a wikipedia in a peer-to-peer fashion is
a very interesting research project.  I wonder if you think this sort of
thing is in scope for your work.
 --scott
On Sat, Nov 28, 2015 at 1:45 AM, Yeongjin Jang yeongjinjanggrad@gmail.com
wrote:
...
Hi,
I am Yeongjin Jang, a Ph.D. Student at Georgia Tech.
In our lab (SSLab, https://sslab.gtisc.gatech.edu/),
we are working on a project called B2BWiki,
which enables users to share the contents of Wikipedia through WebRTC
(peer-to-peer sharing).
Website is at here: http://b2bwiki.cc.gatech.edu/
The project aims to help Wikipedia by donating computing resources
from the community; users can donate their traffic (by P2P communication)
and storage (indexedDB) to reduce the load of Wikipedia servers.
For larger organizations, e.g. schools or companies that
have many local users, they can donate a mirror server
similar to GNU FTP servers, which can bootstrap peer sharing.
Potential benefits that we think of are following.

Users can easily donate their resources to the community.

Just visit the website.

Users can get performance benefit if a page is loaded from

multiple local peers / local mirror (page load time got faster!).

Wikipedia can reduce its server workload, network traffic, etc.

Local network operators can reduce network traffic transit

(e.g. cost that is caused by delivering the traffic to the outside).
While we are working on enhancing the implementation,
we would like to ask the opinions from actual developers of Wikipedia.
For example, we want to know whether our direction is correct or not
(will it actually reduce the load?), or if there are some other concerns
that we missed, that can potentially prevent this system from
working as intended. We really want to do somewhat meaningful work
that actually helps run Wikipedia!
Please feel free to give as any suggestions, comments, etc.
If you want to express your opinion privately,
please contact sslab@cc.gatech.edu.
Thanks,
--- Appendix ---
I added some detailed information about B2BWiki in the following.
# Accessing data
When accessing a page on B2BWiki, the browser will query peers first.

If there exist peers that hold the contents, peer to peer download

happens.
2) otherwise, if there is no peer, client will download the content
from the mirror server.
3) If mirror server does not have the content, it downloads from
Wikipedia server (1 access per first download, and update).
# Peer lookup
To enable content lookup for peers,
we manage a lookup server that holds a page_name-to-peer map.
A client (a user's browser) can query the list of peers that
currently hold the content, and select the peer by its freshness
(has hash/timestamp of the content,
has top 2 octet of IP address
(figuring out whether it is local peer or not), etc.
# Update, and integrity check
Mirror server updates its content per each day
(can be configured to update per each hour, etc).
Update check is done by using If-Modified-Since header from Wikipedia
server.
On retrieving the content from Wikipedia, the mirror server stamps a
timestamp
and sha1 checksum, to ensure the freshness of data and its integrity.
When clients lookup and download the content from the peers,
client will compare the sha1 checksum of data
with the checksum from lookup server.
In this settings, users can get older data
(they can configure how to tolerate the freshness of data,
e.g. 1day older, 3day, 1 week older, etc.), and
the integrity is guaranteed by mirror/lookup server.
More detailed information can be obtained from the following website.
http://goo.gl/pSNrjR
(URL redirects to SSLab@gatech website)
Please feel free to give as any suggestions, comments, etc.
Thanks,
Yeongjin Jang

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- 
(http://cscott.net)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Peer-to-peer sharing of the content of Wikipedia through WebRTC

Thanks,