On 3/2/06, Fastfission fastfission@gmail.com wrote:
On 3/2/06, Anthony DiPierro wikilegal@inbox.org wrote:
You could always mix the two. Relicense some parts and rewrite the rest. Of course, figuring out which parts are relicensed and which aren't is almost as hard as just rewriting the thing.
There might be automated ways to do it -- massive database crunches to see who wrote one in an existing article, whether they were under the new scheme, what exactly would need to be rewritten/removed/whatever. But that's a little out of my league, technically speaking -- I don't know if it is feasible in terms of processing power, the amount of time it would take to code the whole thing, etc. (it probably isn't).
It'd be hard enough to do even if it wasn't for the fact that there are certain pieces of text that are copied from one article to another (and even a few bits that are copied from outside GFDL sources). At the least you'd have to analyse the sentences to try to figure out which parts are copies from the same source.
If you're going to go through all that effort, might as well parse the actual text itself. Now I know this type of natural language processing is always touted as not being very far away, but personally I feel the technology is probably already there (between academia and the search engines, especially the "answers engines"). It's just a matter of applying the technology (along with some hints as to the wiki syntax). And the wiki syntax as well as the edit history only serve to make this parsing easier. So if you're going to go automated, that's the way I'd go (hell, if I ever manage to get out of my 9-5 job that's the way I *will* go).
The first approach is of course viable. At the very worst it could be done with the same amount of time it took to create the current version. But in practice it'd be easier. Plus, you could fact check and reference while you're at it.
I imagine it would take a lot less time -- the number of editors has grown dramatically since then, so hypothetically there are already huge resources available. Hypothetically.
FF
Yeah, I really should have used the word "effort" instead of "time". It's a highly scalable process, double the volunteers pretty much means half the time. But I doubt there are enough volunteers willing to put in the effort, just to escape the GFDL. The GFDL just doesn't matter that much to most people. If a rewrite of Wikipedia is going to happen (and personally I feel it's inevitable), it'll probably come from the private industry.
Google could probably do it with no problem, but they're probably not really that interested. Answers Corporation, though - they in a sense are already doing it to some extent (they added original content called AnswerNotes a while ago). And their recent purchase of Brainboost gives them the natural language technology to do it a lot better.
Of course that's kind of off-topic, because Answers Corporation almost surely isn't going to release its original content under a free license.
Anthony