On 3/2/06, Fastfission <fastfission(a)gmail.com> wrote:
On 3/2/06, Anthony DiPierro
<wikilegal(a)inbox.org> wrote:
You could always mix the two. Relicense some
parts and rewrite the
rest. Of course, figuring out which parts are relicensed and which
aren't is almost as hard as just rewriting the thing.
There might be automated ways to do it -- massive database crunches to
see who wrote one in an existing article, whether they were under the
new scheme, what exactly would need to be rewritten/removed/whatever.
But that's a little out of my league, technically speaking -- I don't
know if it is feasible in terms of processing power, the amount of
time it would take to code the whole thing, etc. (it probably isn't).
It'd be hard enough to do even if it wasn't for the fact that there
are certain pieces of text that are copied from one article to another
(and even a few bits that are copied from outside GFDL sources). At
the least you'd have to analyse the sentences to try to figure out
which parts are copies from the same source.
If you're going to go through all that effort, might as well parse the
actual text itself. Now I know this type of natural language
processing is always touted as not being very far away, but personally
I feel the technology is probably already there (between academia and
the search engines, especially the "answers engines"). It's just a
matter of applying the technology (along with some hints as to the
wiki syntax). And the wiki syntax as well as the edit history only
serve to make this parsing easier. So if you're going to go
automated, that's the way I'd go (hell, if I ever manage to get out of
my 9-5 job that's the way I *will* go).
The first
approach is of course viable. At the very worst it could be
done with the same amount of time it took to create the current
version. But in practice it'd be easier. Plus, you could fact check
and reference while you're at it.
I imagine it would take a lot less time -- the number of editors has
grown dramatically since then, so hypothetically there are already
huge resources available. Hypothetically.
FF
Yeah, I really should have used the word "effort" instead of "time".
It's a highly scalable process, double the volunteers pretty much
means half the time. But I doubt there are enough volunteers willing
to put in the effort, just to escape the GFDL. The GFDL just doesn't
matter that much to most people. If a rewrite of Wikipedia is going
to happen (and personally I feel it's inevitable), it'll probably come
from the private industry.
Google could probably do it with no problem, but they're probably not
really that interested. Answers Corporation, though - they in a sense
are already doing it to some extent (they added original content
called AnswerNotes a while ago). And their recent purchase of
Brainboost gives them the natural language technology to do it a lot
better.
Of course that's kind of off-topic, because Answers Corporation almost
surely isn't going to release its original content under a free
license.
Anthony