Hello folks,
in the Chinese Wikipedia an administrator had deleted part of article histories. He deleted the article complete and then reversed the deletion of part of the historical versions. He did this in good faith, because he deleted vandalism edits and copy right violation content. I was the opinion that this is not a good idea because at first the GFDL requires the edit histories (also of the vandalism edits) and second because thus we lost part of the records about the vandals edits.
What is the right way here?
Thanks for any advise.
Ting
2008/9/12 Ting Chen wing.philopp@gmx.de:
in the Chinese Wikipedia an administrator had deleted part of article histories. He deleted the article complete and then reversed the deletion of part of the historical versions. He did this in good faith, because he deleted vandalism edits and copy right violation content. I was the opinion that this is not a good idea because at first the GFDL requires the edit histories (also of the vandalism edits) and second because thus we lost part of the records about the vandals edits. What is the right way here?
That's the way we usually clean up copyvios on en:wp. Severely libelous, illegal or personally dangerous content (home phone numbers, etc) may even be oversighted, so not even admins can see it. As long as all actual present content has its source attributed, the Foundation hasn't so far considered this problematic in GFDL terms.
- d.
Hello,
On Fri, Sep 12, 2008 at 5:19 PM, David Gerard dgerard@gmail.com wrote:
2008/9/12 Ting Chen wing.philopp@gmx.de:
in the Chinese Wikipedia an administrator had deleted part of article histories. He deleted the article complete and then reversed the deletion of part of the historical versions. He did this in good faith, because he deleted vandalism edits and copy right violation content. I was the opinion that this is not a good idea because at first the GFDL requires the edit histories (also of the vandalism edits) and second because thus we lost part of the records about the vandals edits. What is the right way here?
That's the way we usually clean up copyvios on en:wp. Severely libelous, illegal or personally dangerous content (home phone numbers, etc) may even be oversighted, so not even admins can see it. As long as all actual present content has its source attributed, the Foundation hasn't so far considered this problematic in GFDL terms.
The policy on fr.wikipedia is to keep all revisions but those that (may) cause legal issues (copyvio, slander, etc.: routine) or that require an oversight action (pretty rare).
I've never seen edit histories deleted for copyright violation, but I suppose it's not a crazy misuse of the admis tools. I've seen histories deleted for BLP violations, and spam links that go to stuff like porn, and, if I recall correctly, personal attacks (esp. those that "out" our contributors). So I guess there are circumstances where removing destructive edits doesn't qualify as a violation of the GFDL.
Ford MF
On 9/12/08, Ting Chen wing.philopp@gmx.de wrote:
Hello folks,
in the Chinese Wikipedia an administrator had deleted part of article histories. He deleted the article complete and then reversed the deletion of part of the historical versions. He did this in good faith, because he deleted vandalism edits and copy right violation content. I was the opinion that this is not a good idea because at first the GFDL requires the edit histories (also of the vandalism edits) and second because thus we lost part of the records about the vandals edits.
What is the right way here?
Thanks for any advise.
Ting
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
2008/9/12 David Moran fordmadoxfraud@gmail.com:
I've never seen edit histories deleted for copyright violation, but I suppose it's not a crazy misuse of the admis tools. I've seen histories deleted for BLP violations, and spam links that go to stuff like porn, and, if I recall correctly, personal attacks (esp. those that "out" our contributors). So I guess there are circumstances where removing destructive edits doesn't qualify as a violation of the GFDL.
It should be applied very conservatively, but there is a place for it.
- d.
On Fri, Sep 12, 2008 at 5:13 PM, Ting Chen wing.philopp@gmx.de wrote:
in the Chinese Wikipedia an administrator had deleted part of article histories. He deleted the article complete and then reversed the deletion of part of the historical versions. He did this in good faith, because he deleted vandalism edits and copy right violation content. I was the opinion that this is not a good idea because at first the GFDL requires the edit histories (also of the vandalism edits) and second because thus we lost part of the records about the vandals edits.
In my opinion only the edits from which the current version is derived are obligatorily given. Thus, if an edit is completely removed, removing both the edit and its undoing from the history would not be objectionable from a licensing point of view.
On it.wiki we hide copyvio revisions by deleting them, then if some correct edit is deleted we add the history entry in the page talk. We also have a gadget to do that: http://it.wikipedia.org/wiki/MediaWiki:Gadget-tb-formatHistory.js .
Nemo
Ok, thank you very much for your quick response. You are great and helped us alot :)
Ting
2008/9/12 Ting Chen wing.philopp@gmx.de:
Hello folks,
in the Chinese Wikipedia an administrator had deleted part of article histories. He deleted the article complete and then reversed the deletion of part of the historical versions. He did this in good faith, because he deleted vandalism edits and copy right violation content. I was the opinion that this is not a good idea because at first the GFDL requires the edit histories (also of the vandalism edits) and second because thus we lost part of the records about the vandals edits.
What is the right way here?
If the edit has been reverted I see no reason why it would need to be attributed to anyone since it isn't there any more. The GFDL requires us to attribute anything we use but we don't have to use everything that's posted.
On Fri, Sep 12, 2008 at 1:28 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
If the edit has been reverted I see no reason why it would need to be attributed to anyone since it isn't there any more. The GFDL requires us to attribute anything we use but we don't have to use everything that's posted.
This is tricky ground though. Somebody could decide to selectively delete meaningful contributions from which the current version of a page are derived. The only edits you can selectively delete under the GFDL are those which did not serve as a basis for later derivatives.
--Andrew Whitworth
2008/9/12 Andrew Whitworth wknight8111@gmail.com:
On Fri, Sep 12, 2008 at 1:28 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
If the edit has been reverted I see no reason why it would need to be attributed to anyone since it isn't there any more. The GFDL requires us to attribute anything we use but we don't have to use everything that's posted.
This is tricky ground though. Somebody could decide to selectively delete meaningful contributions from which the current version of a page are derived. The only edits you can selectively delete under the GFDL are those which did not serve as a basis for later derivatives.
Yeah, like I said, if the edit has been reverted there's no problem. If the edit hasn't been reverted than obviously you can't just delete it, but nobody has claimed you can.
On Fri, Sep 12, 2008 at 2:02 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2008/9/12 Andrew Whitworth wknight8111@gmail.com:
On Fri, Sep 12, 2008 at 1:28 PM, Thomas Dalton thomas.dalton@gmail.com
wrote:
If the edit has been reverted I see no reason why it would need to be attributed to anyone since it isn't there any more. The GFDL requires us to attribute anything we use but we don't have to use everything that's posted.
This is tricky ground though. Somebody could decide to selectively delete meaningful contributions from which the current version of a page are derived. The only edits you can selectively delete under the GFDL are those which did not serve as a basis for later derivatives.
Yeah, like I said, if the edit has been reverted there's no problem. If the edit hasn't been reverted than obviously you can't just delete it, but nobody has claimed you can.
Well, there are many instances in the English Wikipedia where this has been done.
2008/9/12 Andrew Whitworth wknight8111@gmail.com:
On Fri, Sep 12, 2008 at 1:28 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
If the edit has been reverted I see no reason why it would need to be attributed to anyone since it isn't there any more. The GFDL requires us to attribute anything we use but we don't have to use everything that's posted.
This is tricky ground though. Somebody could decide to selectively delete meaningful contributions from which the current version of a page are derived. The only edits you can selectively delete under the GFDL are those which did not serve as a basis for later derivatives.
--Andrew Whitworth
Yup which means a copyvio will generally mean deleting back to the last clean version which can be a bit of a pain.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
geni wrote:
2008/9/12 Andrew Whitworth wknight8111@gmail.com:
On Fri, Sep 12, 2008 at 1:28 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
If the edit has been reverted I see no reason why it would need to be attributed to anyone since it isn't there any more. The GFDL requires us to attribute anything we use but we don't have to use everything that's posted.
This is tricky ground though. Somebody could decide to selectively delete meaningful contributions from which the current version of a page are derived. The only edits you can selectively delete under the GFDL are those which did not serve as a basis for later derivatives.
--Andrew Whitworth
Yup which means a copyvio will generally mean deleting back to the last clean version which can be a bit of a pain.
I know when there's been a large number of edits which required deletion, the best option is to rollback to the last known good version and notify everyone we can who had good edits after that that they have to do it again... a real pain...
- -- Cary Bass Volunteer Coordinator
Your continued donations keep Wikipedia running! Support the Wikimedia Foundation today: http://donate.wikimedia.org Wikimedia Foundation, Inc. Phone: 415.839.6885 x 601 Fax: 415.882.0495
E-Mail: cary@wikimedia.org
Cary Bass wrote:
geni wrote:
2008/9/12 Andrew Whitworth wknight8111@gmail.com:
On Fri, Sep 12, 2008 at 1:28 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
If the edit has been reverted I see no reason why it would need to be attributed to anyone since it isn't there any more. The GFDL requires us to attribute anything we use but we don't have to use everything that's posted.
This is tricky ground though. Somebody could decide to selectively delete meaningful contributions from which the current version of a page are derived. The only edits you can selectively delete under the GFDL are those which did not serve as a basis for later derivatives.
Yup which means a copyvio will generally mean deleting back to the last clean version which can be a bit of a pain.
I know when there's been a large number of edits which required deletion, the best option is to rollback to the last known good version and notify everyone we can who had good edits after that that they have to do it again... a real pain...
This is surely excessive. Edits since the copyvio may very well have bent the copyvio passages far out of recognition. Remember too that copyright does not apply to ideas, but to the expression of those ideas.
Ec
2008/9/14 Ray Saintonge saintonge@telus.net:
This is surely excessive. Edits since the copyvio may very well have bent the copyvio passages far out of recognition. Remember too that copyright does not apply to ideas, but to the expression of those ideas.
Ec
Problem is that they are are derivative works of a copyvio and if you try and remove the problematical intermediate edits you hit GFDL issues. This is why it is important to catch copyvios fast.
Question. Didn't the practice of oversighting and deleting selective versions previously cause prolems with misattributing who created the adjacent contributions, by virtue of making the "removed" edits functionally invisible? Is that still an issue? If so, wouldn't it be better for oversight or deletion to still list the offensive edits visibly as "existing", but completely unclickable save for those that have permissions to view them? i.e... the final product would look like this:
Edit 10: Oversighter oversights Edit #4 (no clickable link Edit 9: some user adds fine content: <clickable link for anyone> Edit 8: some user adds fine content: <clickable link for anyone> Edit 7, someone adds bad stuff again, but even worse now: <clickable link for only Oversighters, shows just date/time stamp/contributor name or IP, shows name of Oversighter as having blocked this edit> Edit 6: some user adds fine content: <clickable link for anyone> Edit 5: some user adds fine content: <clickable link for anyone> Edit 4, admin deletes page, restores with out Edit #2 <clickable link for anyone> Edit 3: some user adds fine content: <clickable link for anyone> Edit 2, someone adds defamation or whatever "bad" content, <clickable link for only admins, shows just date/time stamp/contributor name or IP, shows name of admin that last deleted this> Edit 1, new page: <clickable link for anyone>
- Joe
2008/9/15 Joe Szilagyi szilagyi@gmail.com:
Question. Didn't the practice of oversighting and deleting selective versions previously cause prolems with misattributing who created the adjacent contributions, by virtue of making the "removed" edits functionally invisible?
That's only an issue if the edit wasn't reverted. No edit should be oversighted/deleted without being reverted first.
On Mon, Sep 15, 2008 at 2:09 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2008/9/15 Joe Szilagyi szilagyi@gmail.com:
Question. Didn't the practice of oversighting and deleting selective versions previously cause prolems with misattributing who created the adjacent contributions, by virtue of making the "removed" edits
functionally
invisible?
That's only an issue if the edit wasn't reverted. No edit should be oversighted/deleted without being reverted first.
This "should" you speak of - are you referring to an enforced rule on all WMF projects or your own personal opinion? Because, things *are* oversighted/deleted without being reverted first. It happens, and it will most likely continue to happen. Allowing it to happen in a way that doesn't violate the GFDL would be a good thing.
On Mon, Sep 15, 2008 at 2:52 PM, Anthony wikimail@inbox.org wrote:
On Mon, Sep 15, 2008 at 2:09 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2008/9/15 Joe Szilagyi szilagyi@gmail.com:
Question. Didn't the practice of oversighting and deleting selective versions previously cause prolems with misattributing who created the adjacent contributions, by virtue of making the "removed" edits
functionally
invisible?
That's only an issue if the edit wasn't reverted. No edit should be oversighted/deleted without being reverted first.
This "should" you speak of - are you referring to an enforced rule on all WMF projects or your own personal opinion? Because, things *are* oversighted/deleted without being reverted first. It happens, and it will most likely continue to happen. Allowing it to happen in a way that doesn't violate the GFDL would be a good thing.
By the way, an example of a time when an edit *should* be oversighted/deleted without being reverted first:
User A creates a BLP. User B adds confidential information about the subject of the biography. Users C, D, E, F, G, H, I, and J make positive contributions to the BLP.
Then the confidential information is discovered. To delete the confidential information you have to delete the revisions created by users B, C, D, E, F, G, H, I, and J. You could do this by reverting to the version by User A, but why in the world *should* you be forced to do that?
Anthony
By the way, an example of a time when an edit *should* be oversighted/deleted without being reverted first:
User A creates a BLP. User B adds confidential information about the subject of the biography. Users C, D, E, F, G, H, I, and J make positive contributions to the BLP.
Then the confidential information is discovered. To delete the confidential information you have to delete the revisions created by users B, C, D, E, F, G, H, I, and J. You could do this by reverting to the version by User A, but why in the world *should* you be forced to do that?
Fine, "undo", then. It doesn't matter what technically happens, what's important is that no part of that edit is still in the current version.
Thomas Dalton wrote:
By the way, an example of a time when an edit *should* be oversighted/deleted without being reverted first:
User A creates a BLP. User B adds confidential information about the subject of the biography. Users C, D, E, F, G, H, I, and J make positive contributions to the BLP.
Then the confidential information is discovered. To delete the confidential information you have to delete the revisions created by users B, C, D, E, F, G, H, I, and J. You could do this by reverting to the version by User A, but why in the world *should* you be forced to do that?
Fine, "undo", then. It doesn't matter what technically happens, what's important is that no part of that edit is still in the current version.
So just remove the confidential information. The subsequent edits can still be judged on their own merits.
Ec
On Mon, Sep 15, 2008 at 8:53 PM, Ray Saintonge saintonge@telus.net wrote:
Thomas Dalton wrote:
By the way, an example of a time when an edit *should* be oversighted/deleted without being reverted first:
User A creates a BLP. User B adds confidential information about the subject of the biography. Users C, D, E, F, G, H, I, and J make positive contributions to the BLP.
Then the confidential information is discovered. To delete the
confidential
information you have to delete the revisions created by users B, C, D,
E, F,
G, H, I, and J. You could do this by reverting to the version by User
A,
but why in the world *should* you be forced to do that?
Fine, "undo", then. It doesn't matter what technically happens, what's important is that no part of that edit is still in the current version.
So just remove the confidential information. The subsequent edits can still be judged on their own merits.
The confidential information is in all the revisions. The software doesn't allow you to "just remove the confidential information".
By the way, an example of a time when an edit *should* be oversighted/deleted without being reverted first:
User A creates a BLP. User B adds confidential information about the subject of the biography. Users C, D, E, F, G, H, I, and J make positive contributions to the BLP.
Then the confidential information is discovered. To delete the confidential information you have to delete the revisions created by users B, C, D, E, F, G, H, I, and J. You could do this by reverting to the version by User A, but why in the world *should* you be forced to do that?
And which solution do you consider to be better? If you just remove the wrong part of the article and then oversight/delete the revisions B, C, D, E, F, G, H, I, J, the history would look like _you_ added all the positive contributions (added by users B–J). Which is, among other problems, a copyright violation.
-- [[cs:User:Mormegil | Petr Kadlec]]
On 9/15/08, Anthony wikimail@inbox.org wrote:
By the way, an example of a time when an edit *should* be oversighted/deleted without being reverted first:
User A creates a BLP. User B adds confidential information about the subject of the biography. Users C, D, E, F, G, H, I, and J make positive contributions to the BLP.
Then the confidential information is discovered. To delete the confidential information you have to delete the revisions created by users B, C, D, E, F, G, H, I, and J. You could do this by reverting to the version by User A, but why in the world *should* you be forced to do that?
One approach (which is already being discussed if I read correctly) would be to keep the "positive contributions" listed in the edit history but not viewable except by oversighters (because they incidentally contain badstuff).
This would preserve GFDL attribution without needing to add any non-standard (non-machine-readable and most likely to be ignored by mirrors/re-users of the content) addenda on the talk page or elsewhere.
Attribution info required by the GFDL: *Who (username, can be forcibly renamed if it causes problems) *When (year -- full timestamp is actually optional, but cannot possibly cause problems)
Optional information can be de-activated if it creates problems: *What (text of each revision) *Why (edit summary) *How ("...using AWB", etc.)
Also those who care about their edit count would also avoid penalty for picking the wrong article to work on. :P
Caveat: Making it obvious that something has been removed makes it easy to obtain from a database dump if one is successful produced after the badstuff was added but before it was removed. It is my understanding that successful database dumps are becoming increasingly rare.
Has anybody ever thought about doing split dumps instead? If there is an emergency where a few lines need to be urgently removed from a database dump it would be more efficient to work with smaller files rather than one big one which has everything since the dawn of time, which very few people will want, and which is greater than the combined GB of all hard drives I've owned.
And anybody who has an ongoing need for ALL PAGE REVISIONS would probably rather use incremental dumps than delete and download everything from scratch.
—C.W.
On Tue, Sep 16, 2008 at 10:15 AM, Charlotte Webb <charlottethewebb@gmail.com
wrote:
Has anybody ever thought about doing split dumps instead?
Yes, this has been discussed to death by lots of people in various different forums. It's not really clear that it would be a significant enough benefit to be worth the (significant) effort.
Having spent the last 48 hours or so importing one of the smaller dump files (enwiki-20080312-page.sql.gz) into MySQL, I'd say the bigger benefit would be derived by creating a set of dump files which are already indexed (could be in addition to the dumps already made). Preferably something which could be accessed in-place while still bzipped (which is actually feasible, and something I'm about halfway finished writing myself). I spend way more time uncompressing and/or importing and/or indexing the dumps than I do downloading them, and I just don't have the terabytes of free disk space needed to keep a full dump around uncompressed.
Once I have everything imported into MySQL, I can just download the new stub dumps and download the new revisions one at a time. As a bonus, I won't have to worry about the history dump failing.
I guess I should just pony up a few hundred dollars for a terabyte hard drive or two. It should be easy to store the text in 900K bzip chunks (which I can then index), but only if I have the drive space to expand everything first and then recompress it. Anyone want to lend me a couple terabyte hard drives for a month in exchange for a copy of anything I manage to produce?
On 9/16/08, Anthony wikimail@inbox.org wrote:
Anyone want to lend me a couple terabyte hard drives for a month in exchange for a copy of anything I manage to produce?
Depends, what are you using this for?
—C.W.
On Wed, Sep 17, 2008 at 10:35 AM, Charlotte Webb <charlottethewebb@gmail.com
wrote:
On 9/16/08, Anthony wikimail@inbox.org wrote:
Anyone want to lend me a couple terabyte hard drives for a month in exchange for a copy of anything I manage to produce?
Depends, what are you using this for?
To get a recent, usable, full history English Wikipedia dump which can be accessed randomly (read only) while bzipped.
I just looked at the prices, and I'm just going to go ahead and buy it myself. Then when I'm done I'll try to sell the drive with the data already on it to try and at least get my money back.
Joe Szilagyi szilagyi@gmail.com wrote:
If so, wouldn't it be better for oversight or deletion to still list the offensive edits visibly as "existing", but completely unclickable save for those that have permissions to view them?
Yes; see the upcoming bitfields feature. http://www.mediawiki.org/wiki/Bitfields_for_rev_deleted http://www.mediawiki.org/wiki/Image:Revision_deletion_history_view.png
geni wrote:
2008/9/14 Ray Saintonge saintonge@telus.net:
This is surely excessive. Edits since the copyvio may very well have bent the copyvio passages far out of recognition. Remember too that copyright does not apply to ideas, but to the expression of those ideas.
Ec
Problem is that they are are derivative works of a copyvio and if you try and remove the problematical intermediate edits you hit GFDL issues. This is why it is important to catch copyvios fast.
Sure, it's self-evident that we should catch them early, but the longer the copyvio goes undetected the less likely it is that the copyvio will remain on the current edition. Many of the intermediate edits may even be to completely unrelated parts of the article.
Ec
wikimedia-l@lists.wikimedia.org