http://advogato.org/article/994.html
Peer-to-peer git repositories. Imagine a MediaWiki with the data stored in git, and updates distributed peer-to-peer.
"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."
This would certainly go some way to solving the "a good dump is all but impossible" problem ...
(so, anyone hacked up a git backend for MediaWiki revisions rather than MySQL? :-) )
- d.
On Thu, Dec 4, 2008 at 6:20 PM, David Gerard dgerard@gmail.com wrote:
http://advogato.org/article/994.html
Peer-to-peer git repositories. Imagine a MediaWiki with the data stored in git, and updates distributed peer-to-peer.
"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."
This would certainly go some way to solving the "a good dump is all but impossible" problem ...
http://www.foo.be/cgi-bin/wiki.pl/2007-11-10_Dreaming_Of_Mediawiki_Using_GIT
It takes about ~10 minutes to create a mwdump to git converter using git-fast-import. I did this for amusement once in order to run git-blame on articles. I'm not so clear on what else do with it once in git. One advantage is that the storage requirements are reasonably modest.
It would be nice to see have more advanced SCM features in the Wiki... but between the technical challenges and the learning curves (even most CVS and SVN users barely know howto do more than checkout and check-in), I wouldn't expect it anytime soon.
On Thu, Dec 04, 2008 at 07:09:36PM -0500, Gregory Maxwell wrote:
It takes about ~10 minutes to create a mwdump to git converter using git-fast-import. I did this for amusement once in order to run git-blame on articles.
How fast was the git import? How many articles did you try to import? How was the storage requirements? How effective was the git blame, since it would only work at the line (paragraph) level?
I considered doing this but I got sidetracked doing a word-level blame function (see http://hewgill.com/journal/entries/461-wikipedia-blame) and never got back to the git import.
I would like to see a properly maintained copy of wikipedia in git, particularly so I could clone and keep it up to date.
Greg Hewgill http://hewgill.com
I've looked in to this a little but still in quite a pie in the sky way, I made the sqlite db layer with the idea of it being simpler to incorporate into a client based "mediawikilite" app, I made some notes at these articles: http://www.organicdesign.co.nz/MediaWikiLite http://www.organicdesign.co.nz/PeerPedia A lite mediawiki could then work as a peer and sqlite integrate with a distributed storage system such as a DHT or with Git. Perhaps interwiki could be used as an addressing scheme to separate different wikis within the common distributed storage space?
David Gerard wrote:
http://advogato.org/article/994.html
Peer-to-peer git repositories. Imagine a MediaWiki with the data stored in git, and updates distributed peer-to-peer.
"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."
This would certainly go some way to solving the "a good dump is all but impossible" problem ...
(so, anyone hacked up a git backend for MediaWiki revisions rather than MySQL? :-) )
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hoi, As I have indicated in the past a team at the Vrije Universiteit Amsterdam, the team that includes Andrew Tannenbaum, has been working on creating a peer to peer MediaWiki. Their goal is to be able to support a Wiki like the English language Wikipedia. They have developed algorithms that should allow for the kind of issues like having the data close to the readers, propagating changes and conflict resolution to these changes.
The problem they have faced that did not resolve itself is to get the traffic data that allows them to test their algorithms against the real world. In the pastI have tried to get people's attention to no avail. I think the VU is still interested, it would be cool if this serious attempt at a peer to peer Wikipedia would get at least attention. There are few people like Andrew Tannenbaum who could be trusted to understand the issues involved. Thanks, GerardM
2008/12/5 David Gerard dgerard@gmail.com
http://advogato.org/article/994.html
Peer-to-peer git repositories. Imagine a MediaWiki with the data stored in git, and updates distributed peer-to-peer.
"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."
This would certainly go some way to solving the "a good dump is all but impossible" problem ...
(so, anyone hacked up a git backend for MediaWiki revisions rather than MySQL? :-) )
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2008/12/5 Gerard Meijssen gerard.meijssen@gmail.com:
As I have indicated in the past a team at the Vrije Universiteit Amsterdam, the team that includes Andrew Tannenbaum, has been working on creating a peer to peer MediaWiki. Their goal is to be able to support a Wiki like the English language Wikipedia. They have developed algorithms that should allow for the kind of issues like having the data close to the readers, propagating changes and conflict resolution to these changes. The problem they have faced that did not resolve itself is to get the traffic data that allows them to test their algorithms against the real world. In the pastI have tried to get people's attention to no avail. I think the VU is still interested, it would be cool if this serious attempt at a peer to peer Wikipedia would get at least attention. There are few people like Andrew Tannenbaum who could be trusted to understand the issues involved.
They might want to talk to Wikileaks, then - they were interested in a distributed database and they certainly get the traffic.
- d.
Hoi, The English Wikipedia is used all over the world; the districution of requests for content can be regional and in certain cases will prove to be regional. The chances of edit conflicts is of a different order in en.wikipedia. Wikileaks is unlikely to approximate the traffic we have in the English Wikipedia.
You do not need to have all the data to evaluate the functionality that has been developed. But the data has to be statistically relevant enough to be able to understand the issues as they occur.
My point is that the VU has a need for data and they have so far been ignored even though I made my attempts to get them connected. Thanks, GerardM
2008/12/5 David Gerard dgerard@gmail.com
2008/12/5 Gerard Meijssen gerard.meijssen@gmail.com:
As I have indicated in the past a team at the Vrije Universiteit
Amsterdam,
the team that includes Andrew Tannenbaum, has been working on creating a peer to peer MediaWiki. Their goal is to be able to support a Wiki like
the
English language Wikipedia. They have developed algorithms that should
allow
for the kind of issues like having the data close to the readers, propagating changes and conflict resolution to these changes. The problem they have faced that did not resolve itself is to get the traffic data that allows them to test their algorithms against the real world. In the pastI have tried to get people's attention to no avail. I think the VU is still interested, it would be cool if this serious
attempt
at a peer to peer Wikipedia would get at least attention. There are few people like Andrew Tannenbaum who could be trusted to understand the
issues
involved.
They might want to talk to Wikileaks, then - they were interested in a distributed database and they certainly get the traffic.
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Gerard Meijssen schreef:
Hoi, The English Wikipedia is used all over the world; the districution of requests for content can be regional and in certain cases will prove to be regional. The chances of edit conflicts is of a different order in en.wikipedia. Wikileaks is unlikely to approximate the traffic we have in the English Wikipedia.
You do not need to have all the data to evaluate the functionality that has been developed. But the data has to be statistically relevant enough to be able to understand the issues as they occur.
My point is that the VU has a need for data and they have so far been ignored even though I made my attempts to get them connected. Thanks, GerardM
Doesn't Domas collect all kinds of statistics at http://dammit.lt/wikistats/ ? Maybe those stats could be used and/or extended to fit the VU's needs.
Roan Kattouw (Catrope)
The problem they have faced that did not resolve itself is to get the traffic data that allows them to test their algorithms against the real world. In the pastI have tried to get people's attention to no avail
Can you please stop whining? We've been sending data to VU for ages.
Anyway, I love p2p treads!
Hoi, I brought Domas and Guillaume into contact based on this reply, Guillaume has already answered and I hope that finally we get some resolution. Thanks, GerardM
2008/12/5 Domas Mituzas midom.lists@gmail.com
The problem they have faced that did not resolve itself is to get the traffic data that allows them to test their algorithms against the real world. In the pastI have tried to get people's attention to no avail
Can you please stop whining? We've been sending data to VU for ages.
Anyway, I love p2p treads!
-- Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
David Gerard wrote:
"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."
The idea of P2P distribution is good. The idea of using Git for this is not.
This is exactly what Git is *not* optimized for: lots of individual files with almost no relation to each other. The same reason you are advised to not use Git to version control your home directory.
Git handles trees, not individual files. In a wiki like MediaWiki, each article has its own history and revision control. Merges to an article doesn't mean that the whole tree (many gigabytes big) must be handled in the same tree-wide commit. The user must be able to commit changes to the most recent revision of, for example, [[Los Angeles, California]], and push such commits while still holding an outdated revision of some unrelated article, like [[Comic opera]].
The rate of changes in en.Wikipedia can me measured in edits per second, and these edits are only related to each other (ancestor-descendant relationship) in each own article. To move to a model where the whole Wikipedia is a single repository with single tree-wide revisions would severely disrupt its efficiency.
Ironically, the per-file revision control model employed by now-obsolescent VCSes like CVS and RCS would fit Wikipedia better than Git (emphasis on revision control *model*, not software).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
A good idea, but cannot practice.
- -- Jackey Tse | skjackey_tse | Web Developer | 在.hk http://xn--3ds.hk ( xn--3ds.hk)
2008/12/5 Juliano F. Ravasi ml@juliano.info
David Gerard wrote:
"Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers."
The idea of P2P distribution is good. The idea of using Git for this is not.
This is exactly what Git is *not* optimized for: lots of individual files with almost no relation to each other. The same reason you are advised to not use Git to version control your home directory.
Git handles trees, not individual files. In a wiki like MediaWiki, each article has its own history and revision control. Merges to an article doesn't mean that the whole tree (many gigabytes big) must be handled in the same tree-wide commit. The user must be able to commit changes to the most recent revision of, for example, [[Los Angeles, California]], and push such commits while still holding an outdated revision of some unrelated article, like [[Comic opera]].
The rate of changes in en.Wikipedia can me measured in edits per second, and these edits are only related to each other (ancestor-descendant relationship) in each own article. To move to a model where the whole Wikipedia is a single repository with single tree-wide revisions would severely disrupt its efficiency.
Ironically, the per-file revision control model employed by now-obsolescent VCSes like CVS and RCS would fit Wikipedia better than Git (emphasis on revision control *model*, not software).
-- Juliano F. Ravasi ·· http://juliano.info/ 5105 46CC B2B7 F0CD 5F47 E740 72CA 54F4 DF37 9E96
"A candle loses nothing by lighting another candle." -- Erin Majors
- NOTE: Don't try to reach me through this address, use "contact@"
instead.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Dec 5, 2008 at 1:36 PM, Juliano F. Ravasi ml@juliano.info wrote:
Ironically, the per-file revision control model employed by now-obsolescent VCSes like CVS and RCS would fit Wikipedia better than Git (emphasis on revision control *model*, not software).
...because RCS tracks one file at a time while git tracks whole trees, as you point out. However git's shortcomings when used for a wiki could also be used by having a separate repository for each article. You wouldn't get many of the more interesting git features, and couldn't do `git pull' to update the whole wiki. But it would be interesting to compare one-git/mercurial/whatever-repo-per-article to one-rcs-file-per-article or the current one-article-history-per-article MediaWiki tracks in its database.
wikitech-l@lists.wikimedia.org