We need to make it easy to fork and leave

List overview All Threads
Download

newer

older

sidebar statistics

FIXMEs for 1.18

David Gerard

12 Aug 2011 12 Aug '11

1:55 p.m.

[posted to foundation-l and wikitech-l, thread fork of a discussion elsewhere]

THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to fork the projects, so as to preserve them.

This is the single point of failure problem. The reasons for it having happened are obvious, but it's still a problem. Blog posts (please excuse me linking these yet again):

* http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/ * http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/

I dream of the encyclopedia being meaningfully backed up. This will require technical attention specifically to making the projects - particularly that huge encyclopedia in English - meaningfully forkable.

Yes, we should be making ourselves forkable. That way people don't *have* to trust us.

We're digital natives - we know the most effective way to keep something safe is to make sure there's lots of copies around.

How easy is it to set up a copy of English Wikipedia - all text, all pictures, all software, all extensions and customisations to the software? What bits are hard? If a sizable chunk of the community wanted to fork, how can we make it *easy* for them to do so?

And I ask all this knowing that we don't have the paid tech resources to look into it - tech is a huge chunk of the WMF budget and we're still flat-out just keeping the lights on. But I do think it needs serious consideration for long-term preservation of all this work.

- d.

Show replies by date

John Elliot

12 Aug 12 Aug

2:36 p.m.

On 12/08/2011 8:55 PM, David Gerard wrote:

...

THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to fork the projects, so as to preserve them.

I have an idea that might be practical and go some way toward solving your problem.

Wikipedia is an impressive undertaking, and as you mentioned on your blog it has become part of the background as a venerable institution, however it is still dwarfed by the institution that is the World Wide Web (which, by the way, runs on web-standards like HTML5 :).

To give a little context concerning the start of the art, a bit over a week ago I decided to start a club. Within a matter of days I had a fully functioning web-site for my club, with two CRM systems (a wiki and a blog), and a number of other administrative facilities, all due to the power and availability of open-source software. As time goes by there are only going to be more, not less, people like me. People who have the capacity to run their own content management systems out of their own garages (mine's actually in a slicehost.net datacenter, but it *used* to be in my garage, and by rights it could be, except that I don't actually *have* a garage any more, but that's another story).

The thing about me, is that there can be hundreds of thousands of people like me, and when you add up all our contributions, you have a formidable force. I can't host Wikipedia, but there could be facilities in place for me to be able to easily mirror the parts of it that are relevant to me. For instance, on my Network administration page, I have a number of links to other sites, several of which are links to Wikipedia:

http://www.progclub.org/wiki/Network_administration#Links

Links such as:

http://en.wikipedia.org/wiki/Subversion

Now by rights there could be a registry in my MediaWiki installation that recorded en.wikipedia.org as being another wiki with a particular content distribution policy, such as a policy permitting local mirroring. MediaWiki, when it noticed that I had linked to such a facility, could replace the link, changing it to a link on my local system, e.g.

http://www.progclub.org/wiki/Wikepedia:Subversion

There could then be a facility in place to periodically update the mirrored copies in my own system. Attribution for these copies would be given to a 'system user', such as the 'Interwiki Update Service'. The edit history for the version on my system would only show versions for each time the update service had updated the content. Links for the 'edit' button could be wired up so that when someone tried to edit,

http://www.progclub.org/wiki/Wikipedia:Subversion

on my server, they were redirected to the Wikipedia edit facility, assuming that such a facility was still available. In the case that Wikipedia was no more, it would be possible to turn off mirroring, and in that case the 'edit' facility would allow for edits of the local content.

That's probably a far more practical approach to take than say, something like distributing the entire English database via BitTorrent. By all means do that too, but I'd suggest that if you're looking for an anarchically-scalable distributed hypermedia solution, you won't have to look much past the web.

John.

Maciej Jaros

10:20 p.m.

John Elliot (2011-08-12 13:36):

...

[...] The thing about me, is that there can be hundreds of thousands of people like me, and when you add up all our contributions, you have a formidable force. I can't host Wikipedia, but there could be facilities in place for me to be able to easily mirror the parts of it that are relevant to me. For instance, on my Network administration page, I have a number of links to other sites, several of which are links to Wikipedia:

http://www.progclub.org/wiki/Network_administration#Links

Links such as:

http://en.wikipedia.org/wiki/Subversion

Now by rights there could be a registry in my MediaWiki installation that recorded en.wikipedia.org as being another wiki with a particular content distribution policy, such as a policy permitting local mirroring. MediaWiki, when it noticed that I had linked to such a facility, could replace the link, changing it to a link on my local system, e.g.

http://www.progclub.org/wiki/Wikepedia:Subversion

...

That's a very interesting idea... And it should be really hard to do.

Let's say you linked the Subversion article and you've set up that the address: http://en.wikipedia.org/wiki/$1 To be hosted as: http://www.progclub.org/wiki/en-wiki:...

Now each time your user clicks on a link everything gets registered in your installation as "to be downloaded" and upon given number of clicks and/or given number of resources and/or at given time to be downloaded to your site.

The tricky part would be that you not only need the article itself, but also it's templates and that can be quite a lot with first articles you get. Further more this extension would probably need to allow users to opt-out of downloading images and maybe instead of getting wikicode just host rendered HTML so that you don't really need to host templates.

And speaking of images - the problem with any of the solutions is - who would really want to spend money to host all this data? There were times when Wikipedia had many hold ups, but now I feel there are more chances that your own server would choke on the data rather then Wikipedia servers. Maybe ads added to self hosted articles would be worth it, but I kinda doubt anyone would want to host images unless they had to.

BTW. I think a dynamic fork was already made by France Telecom. They fork Polish Wikipedia and update articles in a matter of minutes (or at least they did last time I've checked - they even hosted talk pages so it was easy to test). You can see the fork here: http://wikipedia.wp.pl/

Note that they don't host images though they host image pages.

Regards, Nux.

emijrp

9:53 p.m.

New subject: [Foundation-l] We need to make it easy to fork and leave

Man, Gerard is thinking about new methods to fork (in an easy way) single articles, sets of articles or complete wikipedias, and people reply about setting up servers/mediawiki/importing_databases and other geeky weekend parties. That is why there is no successful forks. Forking Wikipedia is _hard_.

People need a button to create a branch of an article or sets of articles, and be allowed to re-write and work in the way they want. Of course, the resulting articles can't be saved/showed close to the Wikipedia articles, but in a new plataform. It would be an interesting experiment.

2011/8/12 David Gerard dgerard@gmail.com

...

[posted to foundation-l and wikitech-l, thread fork of a discussion elsewhere]

THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to fork the projects, so as to preserve them.

This is the single point of failure problem. The reasons for it having happened are obvious, but it's still a problem. Blog posts (please excuse me linking these yet again):

http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/

http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/

I dream of the encyclopedia being meaningfully backed up. This will require technical attention specifically to making the projects - particularly that huge encyclopedia in English - meaningfully forkable.

Yes, we should be making ourselves forkable. That way people don't *have* to trust us.

We're digital natives - we know the most effective way to keep something safe is to make sure there's lots of copies around.

How easy is it to set up a copy of English Wikipedia - all text, all pictures, all software, all extensions and customisations to the software? What bits are hard? If a sizable chunk of the community wanted to fork, how can we make it *easy* for them to do so?

And I ask all this knowing that we don't have the paid tech resources to look into it - tech is a huge chunk of the WMF budget and we're still flat-out just keeping the lights on. But I do think it needs serious consideration for long-term preservation of all this work.

d.

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

John Vandenberg

13 Aug 13 Aug

10:50 a.m.

New subject: [Foundation-l] We need to make it easy to fork and leave

On Sat, Aug 13, 2011 at 4:53 AM, emijrp emijrp@gmail.com wrote:

...

Man, Gerard is thinking about new methods to fork (in an easy way) single articles, sets of articles or complete wikipedias, and people reply about setting up servers/mediawiki/importing_databases and other geeky weekend parties. That is why there is no successful forks. Forking Wikipedia is _hard_.

People need a button to create a branch of an article or sets of articles, and be allowed to re-write and work in the way they want. Of course, the resulting articles can't be saved/showed close to the Wikipedia articles, but in a new plataform. It would be an interesting experiment.

Something like this.. ?

http://wikimedia.org.au/wiki/Proposal:PersonalWikiTool

-- John Vandenberg

emijrp

11:44 a.m.

New subject: [Foundation-l] We need to make it easy to fork and leave

Yes, that tool looks similar to the idea I wrote. Other approaches may be possible too.

2011/8/13 John Vandenberg jayvdb@gmail.com

...

On Sat, Aug 13, 2011 at 4:53 AM, emijrp emijrp@gmail.com wrote:

...
Man, Gerard is thinking about new methods to fork (in an easy way) single articles, sets of articles or complete wikipedias, and people reply about setting up servers/mediawiki/importing_databases and other geeky weekend parties. That is why there is no successful forks. Forking Wikipedia is _hard_.

People need a button to create a branch of an article or sets of

articles,

...
and be allowed to re-write and work in the way they want. Of course, the resulting articles can't be saved/showed close to the Wikipedia articles, but in a new plataform. It would be an interesting experiment.

Something like this.. ?

http://wikimedia.org.au/wiki/Proposal:PersonalWikiTool

-- John Vandenberg

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Krinkle

14 Aug 14 Aug

3:46 p.m.

New subject: [Foundation-l] We need to make it easy to fork and leave

Hi all,

've read most of the previous mails so far. I'd like to clear some confusion (just in case). Please do correct me if I'm wrong and got caught by the confusion myself:

The thread is about one of the following: * .. the ability to clone a MediaWiki install and upload it to your own domain to continue making edits, writing articles etc. * .. getting better dumps of Wikimedia wikis in particular (ie. Wikipedia) * .. being able to install MediaWiki easier or even online (like new wikis on Wikia.com) * .. making it easy for developers to fork the MediaWiki source code repository.

-- Krinkle

David Gerard

3:52 p.m.

New subject: [Foundation-l] We need to make it easy to fork and leave

On 14 August 2011 13:46, Krinkle krinklemail@gmail.com wrote:

...

The thread is about one of the following:

.. the ability to clone a MediaWiki install and upload it to your own domain

to continue making edits, writing articles etc.

.. getting better dumps of Wikimedia wikis in particular (ie. Wikipedia)

.. being able to install MediaWiki easier or even online (like new wikis on

Wikia.com)

.. making it easy for developers to fork the MediaWiki source code repository.

I was thinking of content and community forks specifically.

MediaWiki is ridiculously easy to set up and install. Setting up a copy to fully function like Wikipedia is somewhat more difficult.

Forking the MediaWiki codebase is not hard, but probably not a good idea. (The two cases I can think of are Citizendium and Wikia, and both now work with and on the mainline and put their local stuff in an extension.)

- d.

emijrp

16 Aug 16 Aug

1:29 a.m.

New subject: [Foundation-l] We need to make it easy to fork and leave

2011/8/14 Krinkle krinklemail@gmail.com

...

Hi all,

've read most of the previous mails so far. I'd like to clear some confusion (just in case). Please do correct me if I'm wrong and got caught by the confusion myself:

The thread is about one of the following:

.. the ability to clone a MediaWiki install and upload it to your own

domain to continue making edits, writing articles etc.

Installing MediaWiki for you is easy for geeks. The only solution for newbies is using wikifarms.

...

.. getting better dumps of Wikimedia wikis in particular (ie. Wikipedia)

A ten years old on-going task.

...

.. being able to install MediaWiki easier or even online (like new wikis

on Wikia.com)

MediaWiki developers issue.

...

.. making it easy for developers to fork the MediaWiki source code

repository.

Trivial. Any developer can set up a repository with a source code snapshot.

Gerard in the first post was speaking about 1) forks, 2) digital preserving

Forking single articles is easy, you just copy/paste (with histories you have to use import/export). Forking a set of articles is just a bit more difficult. Forking the whole Wikipedia is _hard_, you need a good infrastructure and skills.

Digital preserving is a big problem in computer science. It is not solved yet, but if you make backups frequently and in several places, you have a high security to save the data.

To fork you need first the data being preserved, and this links with the dumps generation problem above.

I think people is getting nervous with Wikipedia (and me too), in the same way people is getting worried with Google having control of all your online life (Gmail, Google Reader, Google Calendar, Google+, etc). If Google closes your account, your online life vanishes. If Google dies, your online life too. Of course you can export all your e-mail, contacts, etc, but you lose the @gmail.com address, all links in search engines to your data is broken, etc. Google has a good policy about exporting data, most Internet services don't.

The mankind is compiling all human knowledge in an encyclopedia, which is hosted in faulty metal plates spinning thousand times per minute, managed by faulty humans and located only in one or two locations in the world (Florida, the land of hurricanes and San Francisco, the land of earthquakes).

Making fun of Wikipedia is so 2007. Playing with Wikipedia is so 2001. Losing knowledge is so 48 BC. This is the most important mission human race has ever achieve.

Regards, emijrp

...

Krinkle _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Sumana Harihareswara

7:43 p.m.

New subject: [Foundation-l] We need to make it easy to fork and leave

On 08/15/2011 06:29 PM, emijrp wrote:

...

The mankind is compiling all human knowledge in an encyclopedia, which is hosted in faulty metal plates spinning thousand times per minute, managed by faulty humans and located only in one or two locations in the world (Florida, the land of hurricanes and San Francisco, the land of earthquakes).

Reminder: Florida, northern California, and Virginia. http://www.mediawiki.org/wiki/WMF_Projects/Data_Center_Virginia & http://blog.wikimedia.org/2011/07/01/engineering-june-2011-report/

-- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

4853

Age (days ago)

4857

Last active (days ago)

wikitech-l@lists.wikimedia.org

9 comments

7 participants

tags (0)

participants (7)

David Gerard
emijrp
John Elliot
John Vandenberg
Krinkle
Maciej Jaros
Sumana Harihareswara