[posted to foundation-l and wikitech-l, thread fork of a discussion elsewhere]
THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to fork the projects, so as to preserve them.
This is the single point of failure problem. The reasons for it having happened are obvious, but it's still a problem. Blog posts (please excuse me linking these yet again):
* http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/ * http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/
I dream of the encyclopedia being meaningfully backed up. This will require technical attention specifically to making the projects - particularly that huge encyclopedia in English - meaningfully forkable.
Yes, we should be making ourselves forkable. That way people don't *have* to trust us.
We're digital natives - we know the most effective way to keep something safe is to make sure there's lots of copies around.
How easy is it to set up a copy of English Wikipedia - all text, all pictures, all software, all extensions and customisations to the software? What bits are hard? If a sizable chunk of the community wanted to fork, how can we make it *easy* for them to do so?
And I ask all this knowing that we don't have the paid tech resources to look into it - tech is a huge chunk of the WMF budget and we're still flat-out just keeping the lights on. But I do think it needs serious consideration for long-term preservation of all this work.
- d.
On 12/08/2011 8:55 PM, David Gerard wrote:
THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to fork the projects, so as to preserve them.
I have an idea that might be practical and go some way toward solving your problem.
Wikipedia is an impressive undertaking, and as you mentioned on your blog it has become part of the background as a venerable institution, however it is still dwarfed by the institution that is the World Wide Web (which, by the way, runs on web-standards like HTML5 :).
To give a little context concerning the start of the art, a bit over a week ago I decided to start a club. Within a matter of days I had a fully functioning web-site for my club, with two CRM systems (a wiki and a blog), and a number of other administrative facilities, all due to the power and availability of open-source software. As time goes by there are only going to be more, not less, people like me. People who have the capacity to run their own content management systems out of their own garages (mine's actually in a slicehost.net datacenter, but it *used* to be in my garage, and by rights it could be, except that I don't actually *have* a garage any more, but that's another story).
The thing about me, is that there can be hundreds of thousands of people like me, and when you add up all our contributions, you have a formidable force. I can't host Wikipedia, but there could be facilities in place for me to be able to easily mirror the parts of it that are relevant to me. For instance, on my Network administration page, I have a number of links to other sites, several of which are links to Wikipedia:
http://www.progclub.org/wiki/Network_administration#Links
Links such as:
http://en.wikipedia.org/wiki/Subversion
Now by rights there could be a registry in my MediaWiki installation that recorded en.wikipedia.org as being another wiki with a particular content distribution policy, such as a policy permitting local mirroring. MediaWiki, when it noticed that I had linked to such a facility, could replace the link, changing it to a link on my local system, e.g.
http://www.progclub.org/wiki/Wikepedia:Subversion
There could then be a facility in place to periodically update the mirrored copies in my own system. Attribution for these copies would be given to a 'system user', such as the 'Interwiki Update Service'. The edit history for the version on my system would only show versions for each time the update service had updated the content. Links for the 'edit' button could be wired up so that when someone tried to edit,
http://www.progclub.org/wiki/Wikipedia:Subversion
on my server, they were redirected to the Wikipedia edit facility, assuming that such a facility was still available. In the case that Wikipedia was no more, it would be possible to turn off mirroring, and in that case the 'edit' facility would allow for edits of the local content.
That's probably a far more practical approach to take than say, something like distributing the entire English database via BitTorrent. By all means do that too, but I'd suggest that if you're looking for an anarchically-scalable distributed hypermedia solution, you won't have to look much past the web.
John.
John Elliot (2011-08-12 13:36):
[...] The thing about me, is that there can be hundreds of thousands of people like me, and when you add up all our contributions, you have a formidable force. I can't host Wikipedia, but there could be facilities in place for me to be able to easily mirror the parts of it that are relevant to me. For instance, on my Network administration page, I have a number of links to other sites, several of which are links to Wikipedia:
http://www.progclub.org/wiki/Network_administration#Links
Links such as:
http://en.wikipedia.org/wiki/Subversion
Now by rights there could be a registry in my MediaWiki installation that recorded en.wikipedia.org as being another wiki with a particular content distribution policy, such as a policy permitting local mirroring. MediaWiki, when it noticed that I had linked to such a facility, could replace the link, changing it to a link on my local system, e.g.
http://www.progclub.org/wiki/Wikepedia:Subversion
...
That's a very interesting idea... And it should be really hard to do.
Let's say you linked the Subversion article and you've set up that the address: http://en.wikipedia.org/wiki/$1 To be hosted as: http://www.progclub.org/wiki/en-wiki:...
Now each time your user clicks on a link everything gets registered in your installation as "to be downloaded" and upon given number of clicks and/or given number of resources and/or at given time to be downloaded to your site.
The tricky part would be that you not only need the article itself, but also it's templates and that can be quite a lot with first articles you get. Further more this extension would probably need to allow users to opt-out of downloading images and maybe instead of getting wikicode just host rendered HTML so that you don't really need to host templates.
And speaking of images - the problem with any of the solutions is - who would really want to spend money to host all this data? There were times when Wikipedia had many hold ups, but now I feel there are more chances that your own server would choke on the data rather then Wikipedia servers. Maybe ads added to self hosted articles would be worth it, but I kinda doubt anyone would want to host images unless they had to.
BTW. I think a dynamic fork was already made by France Telecom. They fork Polish Wikipedia and update articles in a matter of minutes (or at least they did last time I've checked - they even hosted talk pages so it was easy to test). You can see the fork here: http://wikipedia.wp.pl/
Note that they don't host images though they host image pages.
Regards, Nux.
Man, Gerard is thinking about new methods to fork (in an easy way) single articles, sets of articles or complete wikipedias, and people reply about setting up servers/mediawiki/importing_databases and other geeky weekend parties. That is why there is no successful forks. Forking Wikipedia is _hard_.
People need a button to create a branch of an article or sets of articles, and be allowed to re-write and work in the way they want. Of course, the resulting articles can't be saved/showed close to the Wikipedia articles, but in a new plataform. It would be an interesting experiment.
2011/8/12 David Gerard dgerard@gmail.com
[posted to foundation-l and wikitech-l, thread fork of a discussion elsewhere]
THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to fork the projects, so as to preserve them.
This is the single point of failure problem. The reasons for it having happened are obvious, but it's still a problem. Blog posts (please excuse me linking these yet again):
- http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/
- http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/
I dream of the encyclopedia being meaningfully backed up. This will require technical attention specifically to making the projects - particularly that huge encyclopedia in English - meaningfully forkable.
Yes, we should be making ourselves forkable. That way people don't *have* to trust us.
We're digital natives - we know the most effective way to keep something safe is to make sure there's lots of copies around.
How easy is it to set up a copy of English Wikipedia - all text, all pictures, all software, all extensions and customisations to the software? What bits are hard? If a sizable chunk of the community wanted to fork, how can we make it *easy* for them to do so?
And I ask all this knowing that we don't have the paid tech resources to look into it - tech is a huge chunk of the WMF budget and we're still flat-out just keeping the lights on. But I do think it needs serious consideration for long-term preservation of all this work.
- d.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Sat, Aug 13, 2011 at 4:53 AM, emijrp emijrp@gmail.com wrote:
Man, Gerard is thinking about new methods to fork (in an easy way) single articles, sets of articles or complete wikipedias, and people reply about setting up servers/mediawiki/importing_databases and other geeky weekend parties. That is why there is no successful forks. Forking Wikipedia is _hard_.
People need a button to create a branch of an article or sets of articles, and be allowed to re-write and work in the way they want. Of course, the resulting articles can't be saved/showed close to the Wikipedia articles, but in a new plataform. It would be an interesting experiment.
Something like this.. ?
http://wikimedia.org.au/wiki/Proposal:PersonalWikiTool
Yes, that tool looks similar to the idea I wrote. Other approaches may be possible too.
2011/8/13 John Vandenberg jayvdb@gmail.com
On Sat, Aug 13, 2011 at 4:53 AM, emijrp emijrp@gmail.com wrote:
Man, Gerard is thinking about new methods to fork (in an easy way) single articles, sets of articles or complete wikipedias, and people reply about setting up servers/mediawiki/importing_databases and other geeky weekend parties. That is why there is no successful forks. Forking Wikipedia is _hard_.
People need a button to create a branch of an article or sets of
articles,
and be allowed to re-write and work in the way they want. Of course, the resulting articles can't be saved/showed close to the Wikipedia articles, but in a new plataform. It would be an interesting experiment.
Something like this.. ?
http://wikimedia.org.au/wiki/Proposal:PersonalWikiTool
-- John Vandenberg
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Hi all,
've read most of the previous mails so far. I'd like to clear some confusion (just in case). Please do correct me if I'm wrong and got caught by the confusion myself:
The thread is about one of the following: * .. the ability to clone a MediaWiki install and upload it to your own domain to continue making edits, writing articles etc. * .. getting better dumps of Wikimedia wikis in particular (ie. Wikipedia) * .. being able to install MediaWiki easier or even online (like new wikis on Wikia.com) * .. making it easy for developers to fork the MediaWiki source code repository.
-- Krinkle
On 14 August 2011 13:46, Krinkle krinklemail@gmail.com wrote:
The thread is about one of the following:
- .. the ability to clone a MediaWiki install and upload it to your own domain
to continue making edits, writing articles etc.
- .. getting better dumps of Wikimedia wikis in particular (ie. Wikipedia)
- .. being able to install MediaWiki easier or even online (like new wikis on
Wikia.com)
- .. making it easy for developers to fork the MediaWiki source code repository.
I was thinking of content and community forks specifically.
MediaWiki is ridiculously easy to set up and install. Setting up a copy to fully function like Wikipedia is somewhat more difficult.
Forking the MediaWiki codebase is not hard, but probably not a good idea. (The two cases I can think of are Citizendium and Wikia, and both now work with and on the mainline and put their local stuff in an extension.)
- d.
2011/8/14 Krinkle krinklemail@gmail.com
Hi all,
've read most of the previous mails so far. I'd like to clear some confusion (just in case). Please do correct me if I'm wrong and got caught by the confusion myself:
The thread is about one of the following:
- .. the ability to clone a MediaWiki install and upload it to your own
domain to continue making edits, writing articles etc.
Installing MediaWiki for you is easy for geeks. The only solution for newbies is using wikifarms.
- .. getting better dumps of Wikimedia wikis in particular (ie. Wikipedia)
A ten years old on-going task.
- .. being able to install MediaWiki easier or even online (like new wikis
on Wikia.com)
MediaWiki developers issue.
- .. making it easy for developers to fork the MediaWiki source code
repository.
Trivial. Any developer can set up a repository with a source code snapshot.
Gerard in the first post was speaking about 1) forks, 2) digital preserving
Forking single articles is easy, you just copy/paste (with histories you have to use import/export). Forking a set of articles is just a bit more difficult. Forking the whole Wikipedia is _hard_, you need a good infrastructure and skills.
Digital preserving is a big problem in computer science. It is not solved yet, but if you make backups frequently and in several places, you have a high security to save the data.
To fork you need first the data being preserved, and this links with the dumps generation problem above.
I think people is getting nervous with Wikipedia (and me too), in the same way people is getting worried with Google having control of all your online life (Gmail, Google Reader, Google Calendar, Google+, etc). If Google closes your account, your online life vanishes. If Google dies, your online life too. Of course you can export all your e-mail, contacts, etc, but you lose the @gmail.com address, all links in search engines to your data is broken, etc. Google has a good policy about exporting data, most Internet services don't.
The mankind is compiling all human knowledge in an encyclopedia, which is hosted in faulty metal plates spinning thousand times per minute, managed by faulty humans and located only in one or two locations in the world (Florida, the land of hurricanes and San Francisco, the land of earthquakes).
Making fun of Wikipedia is so 2007. Playing with Wikipedia is so 2001. Losing knowledge is so 48 BC. This is the most important mission human race has ever achieve.
Regards, emijrp
--
Krinkle _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 08/15/2011 06:29 PM, emijrp wrote:
The mankind is compiling all human knowledge in an encyclopedia, which is hosted in faulty metal plates spinning thousand times per minute, managed by faulty humans and located only in one or two locations in the world (Florida, the land of hurricanes and San Francisco, the land of earthquakes).
Reminder: Florida, northern California, and Virginia. http://www.mediawiki.org/wiki/WMF_Projects/Data_Center_Virginia & http://blog.wikimedia.org/2011/07/01/engineering-june-2011-report/
wikitech-l@lists.wikimedia.org