[Foundation-l] thoughts on leakages

Anthony wikimail at inbox.org
Sun Jan 13 13:56:39 UTC 2008


On Jan 13, 2008 6:51 AM, Robert Rohde <rarohde at gmail.com> wrote:
> On 1/13/08, David Gerard <dgerard at gmail.com> wrote:
> >
> > <snip>
> > One of the best protections we have against the Foundation being taken
> > over by insane space aliens is good database dumps.
>
> And how long has it been since we had good database dumps?
>
> We haven't had an image dump in ages, and most of the major projects
> (enwiki, dewiki, frwiki, commons) routinely fail to generate full history
> dumps.
>
> I assume it's not intentional, but at the moment it would be very difficult
> to fork the major projects in anything approaching a comprehensive way.
>
You don't really need the full database dump to fork.  All you need is
the current database dump and the stub dump with the list of authors.
You'd lose some textual information this way, but not really that
much.  And with the money and time you'd have to put into creating a
viable fork it wouldn't be hard to get the rest through scraping
and/or a live feed purchase anyway.

The lack of an image dump is a bigger problem, but again through
scraping you could get them all in a few weeks.  And in terms of
images, having them in a tarball really wouldn't speed up the process
that much compared with just scraping them.  You might say what if you
get blocked, but just scraping the images in a single thread (or small
number of threads) probably wouldn't even be noticed, let alone
blocked.

The legal problems would probably be the biggest, though.  Complying
with the GFDL, eliminating the copyright violations (DMCA probably
wouldn't apply to a fork), eliminating the libel and other legal
violations (though Section 230 *might* apply to a fork), making sure
you don't violate the WMF's trademarks...  And then, there's getting
rid of all the self-references.

Forking Wikipedia is by no means impossible, it's just a lot more
expensive and risky than it should be.

> In the absense of dumps, I also wonder how much would survive if a meteorite
> (or similar implausible catastrophe) destroyed the main data center.
>
I thought the raw database itself was backed up privately and
remotely, it just wasn't released to the public because it contains
all the non-public information.  But maybe I'm just assuming an
inordinate level of competence.




More information about the wikimedia-l mailing list