GitTorrent (pie-in-the-sky post)

List overview All Threads
Download

newer

older

Donating/offering domain...

DISPLAYTITLE patch waiting for...

David Gerard

4 Dec 2008 4 Dec '08

3:20 p.m.

http://advogato.org/article/994.html

Peer-to-peer git repositories. Imagine a MediaWiki with the data stored in git, and updates distributed peer-to-peer.

"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."

This would certainly go some way to solving the "a good dump is all but impossible" problem ...

(so, anyone hacked up a git backend for MediaWiki revisions rather than MySQL? :-) )

- d.

Show replies by date

Gregory Maxwell

4 Dec 4 Dec

4:09 p.m.

On Thu, Dec 4, 2008 at 6:20 PM, David Gerard dgerard@gmail.com wrote:

...

http://advogato.org/article/994.html

Peer-to-peer git repositories. Imagine a MediaWiki with the data stored in git, and updates distributed peer-to-peer.

"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."

This would certainly go some way to solving the "a good dump is all but impossible" problem ...

http://www.foo.be/cgi-bin/wiki.pl/2007-11-10_Dreaming_Of_Mediawiki_Using_GIT

It takes about ~10 minutes to create a mwdump to git converter using git-fast-import. I did this for amusement once in order to run git-blame on articles. I'm not so clear on what else do with it once in git. One advantage is that the storage requirements are reasonably modest.

It would be nice to see have more advanced SCM features in the Wiki... but between the technical challenges and the learning curves (even most CVS and SVN users barely know howto do more than checkout and check-in), I wouldn't expect it anytime soon.

Greg Hewgill

5 Dec 5 Dec

12:53 a.m.

On Thu, Dec 04, 2008 at 07:09:36PM -0500, Gregory Maxwell wrote:

...

It takes about ~10 minutes to create a mwdump to git converter using git-fast-import. I did this for amusement once in order to run git-blame on articles.

How fast was the git import? How many articles did you try to import? How was the storage requirements? How effective was the git blame, since it would only work at the line (paragraph) level?

I considered doing this but I got sidetracked doing a word-level blame function (see http://hewgill.com/journal/entries/461-wikipedia-blame) and never got back to the git import.

I would like to see a properly maintained copy of wikipedia in git, particularly so I could clone and keep it up to date.

Greg Hewgill http://hewgill.com

Aran

4 Dec 4 Dec

4:57 p.m.

I've looked in to this a little but still in quite a pie in the sky way, I made the sqlite db layer with the idea of it being simpler to incorporate into a client based "mediawikilite" app, I made some notes at these articles: http://www.organicdesign.co.nz/MediaWikiLite http://www.organicdesign.co.nz/PeerPedia A lite mediawiki could then work as a peer and sqlite integrate with a distributed storage system such as a DHT or with Git. Perhaps interwiki could be used as an addressing scheme to separate different wikis within the common distributed storage space?

David Gerard wrote:

...

http://advogato.org/article/994.html

Peer-to-peer git repositories. Imagine a MediaWiki with the data stored in git, and updates distributed peer-to-peer.

"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."

This would certainly go some way to solving the "a good dump is all but impossible" problem ...

(so, anyone hacked up a git backend for MediaWiki revisions rather than MySQL? :-) )

d.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Gerard Meijssen

5 Dec 5 Dec

12:07 a.m.

Hoi, As I have indicated in the past a team at the Vrije Universiteit Amsterdam, the team that includes Andrew Tannenbaum, has been working on creating a peer to peer MediaWiki. Their goal is to be able to support a Wiki like the English language Wikipedia. They have developed algorithms that should allow for the kind of issues like having the data close to the readers, propagating changes and conflict resolution to these changes.

The problem they have faced that did not resolve itself is to get the traffic data that allows them to test their algorithms against the real world. In the pastI have tried to get people's attention to no avail. I think the VU is still interested, it would be cool if this serious attempt at a peer to peer Wikipedia would get at least attention. There are few people like Andrew Tannenbaum who could be trusted to understand the issues involved. Thanks, GerardM

2008/12/5 David Gerard dgerard@gmail.com

...

http://advogato.org/article/994.html

Peer-to-peer git repositories. Imagine a MediaWiki with the data stored in git, and updates distributed peer-to-peer.

"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."

This would certainly go some way to solving the "a good dump is all but impossible" problem ...

(so, anyone hacked up a git backend for MediaWiki revisions rather than MySQL? :-) )

d.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

David Gerard

12:43 a.m.

2008/12/5 Gerard Meijssen gerard.meijssen@gmail.com:

...

As I have indicated in the past a team at the Vrije Universiteit Amsterdam, the team that includes Andrew Tannenbaum, has been working on creating a peer to peer MediaWiki. Their goal is to be able to support a Wiki like the English language Wikipedia. They have developed algorithms that should allow for the kind of issues like having the data close to the readers, propagating changes and conflict resolution to these changes. The problem they have faced that did not resolve itself is to get the traffic data that allows them to test their algorithms against the real world. In the pastI have tried to get people's attention to no avail. I think the VU is still interested, it would be cool if this serious attempt at a peer to peer Wikipedia would get at least attention. There are few people like Andrew Tannenbaum who could be trusted to understand the issues involved.

They might want to talk to Wikileaks, then - they were interested in a distributed database and they certainly get the traffic.

- d.

Gerard Meijssen

12:53 a.m.

Hoi, The English Wikipedia is used all over the world; the districution of requests for content can be regional and in certain cases will prove to be regional. The chances of edit conflicts is of a different order in en.wikipedia. Wikileaks is unlikely to approximate the traffic we have in the English Wikipedia.

You do not need to have all the data to evaluate the functionality that has been developed. But the data has to be statistically relevant enough to be able to understand the issues as they occur.

My point is that the VU has a need for data and they have so far been ignored even though I made my attempts to get them connected. Thanks, GerardM

2008/12/5 David Gerard dgerard@gmail.com

...

2008/12/5 Gerard Meijssen gerard.meijssen@gmail.com:

...
As I have indicated in the past a team at the Vrije Universiteit

Amsterdam,

...
the team that includes Andrew Tannenbaum, has been working on creating a peer to peer MediaWiki. Their goal is to be able to support a Wiki like

the

...
English language Wikipedia. They have developed algorithms that should

allow

...
for the kind of issues like having the data close to the readers, propagating changes and conflict resolution to these changes. The problem they have faced that did not resolve itself is to get the traffic data that allows them to test their algorithms against the real world. In the pastI have tried to get people's attention to no avail. I think the VU is still interested, it would be cool if this serious

attempt

...
at a peer to peer Wikipedia would get at least attention. There are few people like Andrew Tannenbaum who could be trusted to understand the

issues

...
involved.

They might want to talk to Wikileaks, then - they were interested in a distributed database and they certainly get the traffic.

d.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Roan Kattouw

3:43 a.m.

Gerard Meijssen schreef:

...

Hoi, The English Wikipedia is used all over the world; the districution of requests for content can be regional and in certain cases will prove to be regional. The chances of edit conflicts is of a different order in en.wikipedia. Wikileaks is unlikely to approximate the traffic we have in the English Wikipedia.

You do not need to have all the data to evaluate the functionality that has been developed. But the data has to be statistically relevant enough to be able to understand the issues as they occur.

My point is that the VU has a need for data and they have so far been ignored even though I made my attempts to get them connected. Thanks, GerardM

Doesn't Domas collect all kinds of statistics at http://dammit.lt/wikistats/ ? Maybe those stats could be used and/or extended to fit the VU's needs.

Roan Kattouw (Catrope)

Domas Mituzas

3:40 a.m.

...

The problem they have faced that did not resolve itself is to get the traffic data that allows them to test their algorithms against the real world. In the pastI have tried to get people's attention to no avail

Can you please stop whining? We've been sending data to VU for ages.

Anyway, I love p2p treads!

-- Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]

Gerard Meijssen

4:21 a.m.

Hoi, I brought Domas and Guillaume into contact based on this reply, Guillaume has already answered and I hope that finally we get some resolution. Thanks, GerardM

2008/12/5 Domas Mituzas midom.lists@gmail.com

...

...
The problem they have faced that did not resolve itself is to get the traffic data that allows them to test their algorithms against the real world. In the pastI have tried to get people's attention to no avail

Can you please stop whining? We've been sending data to VU for ages.

Anyway, I love p2p treads!

-- Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Juliano F. Ravasi

5:36 a.m.

David Gerard wrote:

...

"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."

The idea of P2P distribution is good. The idea of using Git for this is not.

This is exactly what Git is *not* optimized for: lots of individual files with almost no relation to each other. The same reason you are advised to not use Git to version control your home directory.

Git handles trees, not individual files. In a wiki like MediaWiki, each article has its own history and revision control. Merges to an article doesn't mean that the whole tree (many gigabytes big) must be handled in the same tree-wide commit. The user must be able to commit changes to the most recent revision of, for example, [[Los Angeles, California]], and push such commits while still holding an outdated revision of some unrelated article, like [[Comic opera]].

The rate of changes in en.Wikipedia can me measured in edits per second, and these edits are only related to each other (ancestor-descendant relationship) in each own article. To move to a model where the whole Wikipedia is a single repository with single tree-wide revisions would severely disrupt its efficiency.

Ironically, the per-file revision control model employed by now-obsolescent VCSes like CVS and RCS would fit Wikipedia better than Git (emphasis on revision control *model*, not software).

-- Juliano F. Ravasi ·· http://juliano.info/ 5105 46CC B2B7 F0CD 5F47 E740 72CA 54F4 DF37 9E96 "A candle loses nothing by lighting another candle." -- Erin Majors * NOTE: Don't try to reach me through this address, use "contact@" instead.

Jackey Tse

9:37 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

A good idea, but cannot practice.

- -- Jackey Tse | skjackey_tse | Web Developer | 在.hk http://xn--3ds.hk ( xn--3ds.hk)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: http://getfiregpg.org iQIcBAEBAgAGBQJJOWZ/AAoJEN5kmg9TdfOJNscP/0PgWlLTfD66CXXf6itvfj1B 4vvr4gvv7hPU58X/F0oOugIEu1PFffn+PYrP+GCB6XOQmFWKWOxIQSFq/74R3LG0 ozVobY3WMA8MHWrG9ZJ29InwdxioG4WrMOesn9tIMMazG3a40CRaAYuchwXYkRol MSLxhGzHGYe4v9GqGuCCOf1MiwCqOrLxie5794084hwAeiY25ZvQpSrvElTWyDpX nN/GvccQmpS4nq65SRW7ShNixHmx3vvvhe1xMi4FRgBEW+0WtX1vUmaJqyeIRZ/E RiuWBha3yuunmF/JhuPM3IIx8mu2OfI1aeh+1KGiWgaYF4S0468ezqLtyK0CpVCj Zjc5ZHk/LiZjXKMGRLi+LQnfsNkv7hR5zXAgUdINhBkx7MUbkq9CbjlobhNnH9tD codnwAgrMT5iZlKuZh46GoJSEkvmT9P57Ng/r1b7VgerR4amjXBBGu/OAwLviQVo LY8UIWO1qVS6xzAeCSEEQRIrLc/cCKhImvyGTl8GJr6FOVHRIesSFDcmjxHx1lZE Cicc6ZtlpNdZXxiZLNUV1RfAvIYHcgYRxn7moWX9YKh33otSPAu2SO/HMr1b3aMA zK3Ry+6RCHKaXettIKVd0Qqot5ASHfyR3k2nNHATU+9UUV5gG0QS5VCeEZx6ZWW4 LGf3hRJMvSieVZITtJ4Z =6eN4 -----END PGP SIGNATURE-----

2008/12/5 Juliano F. Ravasi ml@juliano.info

...

David Gerard wrote:

...
"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."

The idea of P2P distribution is good. The idea of using Git for this is not.

This is exactly what Git is *not* optimized for: lots of individual files with almost no relation to each other. The same reason you are advised to not use Git to version control your home directory.

Git handles trees, not individual files. In a wiki like MediaWiki, each article has its own history and revision control. Merges to an article doesn't mean that the whole tree (many gigabytes big) must be handled in the same tree-wide commit. The user must be able to commit changes to the most recent revision of, for example, [[Los Angeles, California]], and push such commits while still holding an outdated revision of some unrelated article, like [[Comic opera]].

The rate of changes in en.Wikipedia can me measured in edits per second, and these edits are only related to each other (ancestor-descendant relationship) in each own article. To move to a model where the whole Wikipedia is a single repository with single tree-wide revisions would severely disrupt its efficiency.

Ironically, the per-file revision control model employed by now-obsolescent VCSes like CVS and RCS would fit Wikipedia better than Git (emphasis on revision control *model*, not software).

-- Juliano F. Ravasi ·· http://juliano.info/ 5105 46CC B2B7 F0CD 5F47 E740 72CA 54F4 DF37 9E96

"A candle loses nothing by lighting another candle." -- Erin Majors

NOTE: Don't try to reach me through this address, use "contact@"

instead.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Ævar Arnfjörð Bjarmason

8:04 p.m.

On Fri, Dec 5, 2008 at 1:36 PM, Juliano F. Ravasi ml@juliano.info wrote:

...

Ironically, the per-file revision control model employed by now-obsolescent VCSes like CVS and RCS would fit Wikipedia better than Git (emphasis on revision control *model*, not software).

...because RCS tracks one file at a time while git tracks whole trees, as you point out. However git's shortcomings when used for a wiki could also be used by having a separate repository for each article. You wouldn't get many of the more interesting git features, and couldn't do `git pull' to update the whole wiki. But it would be interesting to compare one-git/mercurial/whatever-repo-per-article to one-rcs-file-per-article or the current one-article-history-per-article MediaWiki tracks in its database.

5864

Age (days ago)

5866

Last active (days ago)

wikitech-l@lists.wikimedia.org

12 comments

10 participants

tags (0)

participants (10)

Aran
David Gerard
Domas Mituzas
Gerard Meijssen
Greg Hewgill
Gregory Maxwell
Jackey Tse
Juliano F. Ravasi
Roan Kattouw
Ævar Arnfjörð Bjarmason