I counted these by hand, so there could be mistakes.
[[Category:against policy]] has nearly 1000 items. These are items that should be speedy deleted as obvious copyright violations. Literally, these are supposed to be deleted on sight.
[[Category:Unknown - March 2006]] has over 1000 items. These are items that were tagged as missing source information in March. They should be speedy deleted after 7 days.
[[Category:Unknown - April 2006]] has around 1400 items.
[[Category:Unknown - May 2006]] has an astounding 2600-2800 items for deletion. (You know you're in trouble when you count "1000" and the file names are still at "D")
[[Category:Unknown - June 2006]] has something incredible like over 4000 items for deletion...and it's only June 9th!!! OK actually all the categories 1 June - 8 June have less than 200 items each, so something weird is going on. I don't know where all those extra items are coming from. Hm it looks like some are people misusing the 'Unknown' template and they somehow get added to the current month. Ah... yes, that is it.
[[Category:Incomplete license]] has about 400 items. (For some, their 7 days might not have expired yet.)
[[Category:Images with no copyright tag]] has about 500 items. I think this is all Orgullobot's work (only recently had a bot tag all new uploads with this tag if there's no license tag).
[[Category:Duplicate]] has about 800 items. These are non-essential deletion.
[[Template:Delete] links to some 2000 items (although some of those are policy pages, for example). But even then I think there must be a gap between that template and [[Commons:Deletion requests]].
Just at a rough guess, there are probably about 200 items listed on [[Commons:Deletion requests]]. And these are supposed to be deletions requiring discussion (and for some reason today I see another 43 (!) "delete this because there's an SVG" nominations just from one user :/ ). These are supposed to be important cases that can set precedents. How can admins even find these debates let alone take part in them? This page (the template) is 247kb. No wonder <10 admins (of over 130) regularly look at it.
I don't think it is an exaggeration to say, it doesn't matter how many admins we have, or how hard they work, we cannot realistically reduce this backlog by using the current methods we follow.
On the 8th June less than 250 items were deleted (including categories and articles, etc). Same on the 7th June.
.................................... ????????????????????? ....................................
Is this something we should not worry about? Or how can we ever solve it? Over 12,000 images waiting to be deleted.
A severely depressed, Brianna /[[user:pfctdayelise]]
I'm not so sure that any one person's efforts can make THAT much of a difference, but I'll be happy to help with the deletion backlog.
-Matt
Solution A: Increase the number of admins, which will increase the number of deletions.
Solution B: Have a bot automatically delete files in certain categories that have been there since X days.
Solution C: I would expect that most problematic uploads come from new users. Disable the creation of new users (or their ability to upload) while the total backlog is >X files (no pun intended). Have a messaage displayed prominently to put social pressure on the admins ;-)
(disclaimer: I am not an admin on commons)
Magnus
On 09/06/06, Magnus Manske magnus.manske@web.de wrote:
Solution A: Increase the number of admins, which will increase the number of deletions.
Probably not enough to have a big impact any time soon, and if we were to increase the numbers so that it did, it would be dangerous. And also we don't even have the interested candidates to nominate, so...
Solution B: Have a bot automatically delete files in certain categories that have been there since X days.
Could be possible for the category that is only used by Orgullobot, but you wouldn't want to use it for most categories as people can (and do) tag things maliciously or mistakenly. Manual checking is required IMO. Sad but true.
Also I don't think anyone would be too happy with a bot that had admin powers, doesn't seem to go down well. You could have an admin run a daily script though, I guess. Their talk page would probably be a world of pain as redlinks jumped up in X wikis. CommonsTicker wouldn't help except for images tagged in the last couple of weeks.
Solution C: I would expect that most problematic uploads come from new users. Disable the creation of new users (or their ability to upload) while the total backlog is >X files (no pun intended). Have a messaage displayed prominently to put social pressure on the admins ;-)
What do you mean, social pressure? To encourage them to delete stuff?
You are right that most problematic uploads come from newbies (or oldbies no one has caught yet - there are many of those :( ).
Yet we now have automatic welcoming of all new active accounts, and automatic logging of those accounts ( http://commons.wikimedia.org/wiki/User:Orgullobot/Welcome_log ). And we can't even muster up enough people to check these - a week and there is already a backlog. You don't even have to be an admin to do this.
Disabling newbie uploads is one idea, or forcing them to pass some kind of copyright test. But there would be such backlash we could never do it, plus there's always too many exceptions, different languages...it's a nightmare. Also someone might be new to the Commons but well familiar with copyright issues from Wikipedia or similar.
Any more ideas?
I could be happy for Jimbo to run through these with a razor, at least it will save the Commons from criticism were I to do the same thing :)
Brianna
Brianna Laugher schrieb:
Any more ideas?
Always ;-)
How about a tool (outside commons, maybe on the toolserver) that finds (supposedly) evil-tagged images that have not been touched for the mandatory seven-days period (and the talk page hasn't been touched either). All images on the tool page would meet the condition "these can be deleted, unless it is obvious to a human that they shouldn't".
The page could display the (thumbnailed) image, the text, maybe also the date(s) involved, and a "delete" button which will open the delete page on commons, prefilled (assuimg you're logged in as admin). Maybe it could be tied into the CommonsTicker somehow, to show "this image is currently in use on en and de".
This might increase the speed of decision making and execution (!) significantly. A "disassembly line" for admins, so to speak.
Magnus
Brianna Laugher napisał(a):
[...] Also someone might be new to the Commons but well familiar with copyright issues from Wikipedia or similar.
Single login would take care of this particular issue quite well, I think. Not to say that turning off newbie uploads is a good idea. Or a bad one. Carry on. ;)
Magnus Manske wrote:
Solution A: Increase the number of admins, which will increase the number of deletions.
True, but also increases the number of controversial deletions; on a normal project, any time you delete something, you risk being yelled at by the people on that project. On Commons, you risk being yelled at by everybody involved with Wikimedia. That's a lot of pressure to put on someone, especially when they can't easily leave a "Hey, I noticed [image] seems to need deletion, but you're using it" message (because we have hundreds of languages, and nobody can hope to speak them all).
Solution B: Have a bot automatically delete files in certain categories that have been there since X days.
The vandals will love this one: "Hey, want to really wreak havoc? Go start tagging images with this category. Do it to images with descriptions in small languages, where there's not likely to be anyone who can tell you're actually vandalizing, or anyone watching the image to know to untag it. Even better if you do it to a hundred at a time; they'll assume you're a good contributor, and won't eve ask you questions! Then, this bot will come along and delete it for you, and they won't be able to get it back!"
Solution C: I would expect that most problematic uploads come from new users. Disable the creation of new users (or their ability to upload) while the total backlog is >X files (no pun intended).
Mmmm, DOS opportunity from the same vandals: Tag all kinds of images, no matter what they are, with deletion notices, it will stop new user creation [or uploads by new users] until they can get it back down. Don't worry, even if they revert you, you'll still DOS them for a while, waste thier time, and by the time they've caught you and started reverting, you'll have another account and be doing it again!
Have a messaage displayed prominently to put social pressure on the admins ;-)
Attention, admins: Quickly, resign adminship, because no matter what you're doing, you're going to be blamed for the problem! Oh, and note to vandals: All the admins are resigning, now's the time to hit, because there's even less of them to stop you!
(disclaimer: I am not an admin on commons)
Magnus
Now, don't get me wrong, my responses above are comically over-dramatized to make the point. They're all good ideas (except that last one, don't blame the admins!), they're just prone to abuse, and with the language-barriers and lack of staff on Commons, it'd be an open and waiting target.
I am a Commons admin, and the thing that scares me most about Commons admin work is deletion: Almost everything on Commons, if deleted, cannot be recovered, and Commons materials are used on all Wikimedia projects. There are good tools, like the one that checks usage on other projects, and I'm glad we have them, but we still have the problem that there is a major language barrier, and very low participation from projects.
However, the thing that would make me feel most comfortable deleting would be a way of getting things back. If deleted images could be recovered, I'd have no problem whatsoever with deleting things in those categories left and right. And, it solves the problems raised above with the idea of deleting by bot: As long as we know we can get it back, why not?
I'm sure that the reason images aren't recoverable is an issue of space: We don't have the server space to store tens of thousands of images that were deemed unwanted. Likewise, we don't want to be holding on to copies of images that could get us in trouble, like copyvios. However, it would be incredibly useful to be able to restore an image that shouldn't have been deleted. I wonder if it wouldn't be possible to hold on to deleted images for a period of time, and to then have them be automatically purged. Though I've never gone to a deleted page and been unable to restore content, I'm told that it is possible for very old deleted pages to be purged off entirely, so I wonder if something similar couldnt' be done for images.
It seems to me that an image being deleted is something you would probably notice right away (at least within, say, 7 days) if the image was really important; would it be possible to make it so that deleted images were kept for 7 days, an then purged after that? Even if it was only done on Commons (and it strikes me, it would be a feature that other wikis would want as well, to deal with the possibility of a compromised/rouge admin account) it would help tremendously, because we would no longer have to worry about being in hot water if we deleted something that was mistagged. It might produce a space problem for the first week, while we were all cutting through the 12,000 image backlog, but after that, it seems to me the space requirement of just holding on to things for a week wouldn't be that much. (I'm not a programmer, but common sense suggests moving an image from a to b doesn't really increase the amount of space required by the image, it just adds a log entry, which should be fairly cheap.)
Since we have a developer on the list, thinking about possible solutions <lays out a carrot under a box to trap Magnus> perhaps he could provide us with some more information on what would be needed to make this happen, and perhaps even provide the needed code to make it happen.
I'll pledge, personally, to delete at least 1,000 of the backlogged images if this is implemented.
Essjay
On 6/9/06, Essjay essjaywiki@gmail.com wrote:
I'm sure that the reason images aren't recoverable is an issue of space: We don't have the server space to store tens of thousands of images that were deemed unwanted. Likewise, we don't want to be holding on to copies of images that could get us in trouble, like copyvios.
I'd guess it's more of a historical accident. Undeletion of images just wasn't very important during the early years of the project, and no one got around to implementing it. There's at least one enhancement request in bugzilla, http://bugzilla.wikimedia.org/show_bug.cgi?id=2099.
Since we have a developer on the list, thinking about possible solutions <lays out a carrot under a box to trap Magnus> perhaps he could provide us with some more information on what would be needed to make this happen, and perhaps even provide the needed code to make it happen.
I'll pledge, personally, to delete at least 1,000 of the backlogged images if this is implemented.
Essjay
Looking at the bug report, someone else has pledged $50 on top of that :).
Anthony
On 10/06/06, Essjay essjaywiki@gmail.com wrote:
Magnus Manske wrote:
Solution A: Increase the number of admins, which will increase the number of deletions.
True, but also increases the number of controversial deletions; on a normal project, any time you delete something, you risk being yelled at by the people on that project. On Commons, you risk being yelled at by everybody involved with Wikimedia. That's a lot of pressure to put on someone, especially when they can't easily leave a "Hey, I noticed [image] seems to need deletion, but you're using it" message (because we have hundreds of languages, and nobody can hope to speak them all).
Essjay!!!!!!!!!! http://commons.wikimedia.org/wiki/User:Pfctdayelise/Translations
Apparently our policies officially say that copyvios (+ maybe NSD/NLD? I can't remember) can be deleted without removing the image from use. At least Arnomane and Duesentrieb argue this line and presumably carry it out. But myself, Fred, Bastique, Angr...don't know who else... we fear the wrath of 100+ WM projects too much, so we follow this removing method for all image deletion.
Also: we have at least one functioning "Delinker Bot", but we're not allowed to use it (even with translated messages) because local projects get shitty about unregistered/anonymous bots. Can you imagine registering a bot on some 200 projects?
I hope with the universal login (which was coming "soon" in January, sigh), we can offer an ultimatum and say, "Let our bot work on your project or else DEAL with redlinks", because the situation is too ridiculous.
I am a Commons admin, and the thing that scares me most about Commons admin work is deletion: Almost everything on Commons, if deleted, cannot be recovered, and Commons materials are used on all Wikimedia projects.
But think: most of the things that need deleting are random things that people got off the internet. I figure if they got it off the internet once, if it's really important, odds are they can go and do it again, especially if deletion is close to upload date.
And if it's their own work (or they claim it is ;)), then of course they would have a copy on their own computer. No one would upload something precious to the Commons and then delete it off their own computer! Right?
Brianna
Brianna Laugher wrote:
On 10/06/06, Essjay essjaywiki@gmail.com wrote:
Magnus Manske wrote:
Solution A: Increase the number of admins, which will increase the number of deletions.
True, but also increases the number of controversial deletions; on a normal project, any time you delete something, you risk being yelled at by the people on that project. On Commons, you risk being yelled at by everybody involved with Wikimedia. That's a lot of pressure to put on someone, especially when they can't easily leave a "Hey, I noticed [image] seems to need deletion, but you're using it" message (because we have hundreds of languages, and nobody can hope to speak them all).
Essjay!!!!!!!!!! http://commons.wikimedia.org/wiki/User:Pfctdayelise/Translations
I'll have to look at that first thing tomorrow; I'm off to bed right now
Apparently our policies officially say that copyvios (+ maybe NSD/NLD? I can't remember) can be deleted without removing the image from use. At least Arnomane and Duesentrieb argue this line and presumably carry it out. But myself, Fred, Bastique, Angr...don't know who else... we fear the wrath of 100+ WM projects too much, so we follow this removing method for all image deletion.
Also: we have at least one functioning "Delinker Bot", but we're not allowed to use it (even with translated messages) because local projects get shitty about unregistered/anonymous bots. Can you imagine registering a bot on some 200 projects?
I hope with the universal login (which was coming "soon" in January, sigh), we can offer an ultimatum and say, "Let our bot work on your project or else DEAL with redlinks", because the situation is too ridiculous.
See? This is what I was talking about. Everybody wants to benefit, but nobody wants to accept the responsibility that comes with using what we offer. If you're going to use Commons images, you've got to accept that you're using something under a different set of policies, with a different set of procedures, and adjust to that. Unfortunately, most projects wont.
I am a Commons admin, and the thing that scares me most about Commons admin work is deletion: Almost everything on Commons, if deleted, cannot be recovered, and Commons materials are used on all Wikimedia projects.
But think: most of the things that need deleting are random things that people got off the internet. I figure if they got it off the internet once, if it's really important, odds are they can go and do it again, especially if deletion is close to upload date.
And if it's their own work (or they claim it is ;)), then of course they would have a copy on their own computer. No one would upload something precious to the Commons and then delete it off their own computer! Right?
You're more optimistic than I am; it would not surprise me for a second to hear "I uploaded it here, so why keep it!" And when someone is screaming at a Commons admin for deleting an image in process, they're probably not in a place where a rational argument like "You didn't follow process" is going to work.
I still think the best thing to have would be a way to recover deleted images for a short period of time.
Essjay
Essjay:
See? This is what I was talking about. Everybody wants to benefit, but nobody wants to accept the responsibility that comes with using what we offer. If you're going to use Commons images, you've got to accept that you're using something under a different set of policies, with a different set of procedures, and adjust to that. Unfortunately, most projects wont.
Hm, I disagree that it is a different set of policies. It is just what the policy should be on all Wikimedia projects, if we want to be serious about avoiding copyvios and creating reliable, truly free content. There are not that many projects that allow fair use so for most projects I would guess the policies should be the same.
Different procedures, yes, they vary wildly from wiki to wiki (and I would rather use ours than en.wp's any day of the week... but I am biased :)) so it would be pretty much impossible for that to be standard. Nothing unusual though, I don't think; templates, warnings, 7 days notice, etc.
I still think the best thing to have would be a way to recover deleted images for a short period of time.
Well, possibly, but arguing about the best process rarely (sorry, NEVER) translates into the technical solutions required. You can argue again on http://bugzilla.wikimedia.org/show_bug.cgi?id=2099 if you like.
For a comprehensive list of how technical issues impact on the Commons, you may like to visit http://commons.wikimedia.org/wiki/Commons:Bugs :)
cheers, Brianna
Essjay wrote: <snip> lots of replys</snip> I am aware that none of my "solutions" is perfect, but I'm sure they could be useful. People might be more sympathetic for admins deleting files if "the system" prevents new uploads until the backlog of bad old ones is cleared up. Likewise, once the backlog is down considerably, an automativ deletion after X days would be reasonable. If the backlog grows (vandals), it could pause. Etc.
I am a Commons admin, and the thing that scares me most about Commons admin work is deletion: Almost everything on Commons, if deleted, cannot be recovered, and Commons materials are used on all Wikimedia projects. There are good tools, like the one that checks usage on other projects, and I'm glad we have them, but we still have the problem that there is a major language barrier, and very low participation from projects.
However, the thing that would make me feel most comfortable deleting would be a way of getting things back. If deleted images could be recovered, I'd have no problem whatsoever with deleting things in those categories left and right. And, it solves the problems raised above with the idea of deleting by bot: As long as we know we can get it back, why not?
Is there a rule that says this has to be done *inside* the commons? The images are accessible from everywhere, so deleted images could be copied to other places prior to deletion. The toolserver might be a suitable place, or we could even go for a distributed solution that uses the toolserver as an organizing hub. Similar to the recycle bin on Desktops, old deleted images (where noone complained about the deletion) could be deleted permanently if space gets low. Images would not be accessible for the public, so no problem arises for copyvios.
I'd be willing to code that. Note that this deletion would require going through a script on the toolserver (to copy the image), which would then invoke the coimmons deletion form.
Magnus
OK, working example for safely deleting images on the commons:
http://tools.wikimedia.de/~magnus/safe_delete_commons.php?title=Image:THE_IM...
It will * Check the image text for last edit older than 14 days (can be adapted if necessary) * Check the image text for "delete me" templates; list of templates has to be expanded, please tell me which to use
If all that is OK, it will * Copy the image to the toolserver under a randomized name * Give the randomized name for you to copy, so the image can be restored * Generate a button that leads to page deletion (I'm not a commons admin, so someone please check if that's working) * Give a preview of the existing and copied image to check that copying was successful
The deleting admin *has* to copy the new image url line, otherwise the image can never be found again! That way, the image is stored away from public eyes but still restorable through the deleting admin.
Comments, please.
Magnus
I tried it on http://commons.wikimedia.org/wiki/Image:Alcib%C3%ADades.jpg. Note the weird character in the filename.
Image:Alcib�ades.jpg is now stored as http://tools.wikimedia.de/~magnus/commons_images/6ae2eda2d5.jpg
You should copy the above line into a file on your local machine, as
it is the only way to recover the file at a later date!
Delete file on commons The image below should be identical to the one in the upper right
corner. If not, copying was unsuccessful, and you should not delete the file on the commons!
Neither of the images would load so I couldn't check if this was true.
OK, let's try one without a tricky name: Image:Barrett.jpg
Last edited 78 days ago... bingo!
The delete link worked fine.
OK, interesting.
- Check the image text for "delete me" templates; list of templates has
to be expanded, please tell me which to use
which ones do you have so far?
The deleting admin *has* to copy the new image url line, otherwise the image can never be found again! That way, the image is stored away from public eyes but still restorable through the deleting admin.
I think instead it should make an edit to the image page saying "BACK-UP COPY AT (url)". Because the image is about to be deleted anyway. If time proves the image should be undeleted, you can just undelete the image page, recover the URL and go from there. That seems much easier than storing the URL on my local machine for example. It would also save one manual step ;)
Also...maybe this will encourage admins to delete more stuff, which would be good. But IMO they have no reason for hesitation where the image is unsourced for a long time and the uploader was notified. No hesitation at all. I personally would only use this for images where the case was contested, such as Deletion requests. So not that many cases overall. But if it helps admins feel more secure to go on deletion sprees of unsourced stuff then I support it. :)
I didn't get one that was actually used, though, so I'm not sure what the check-usage part of the interface will look like.
Do TPTB approve of this use of the toolserver? If commons admins get into it, it seems like it could be reasonably intensive...
cheers, Brianna
Also, would it be possible to get a list of all images in [[Category:Unknown - June 2006]] that have not been edited since before June (these would be the images with malformed/depreciated 'unknown' tags)... and that are not used on any projects? (I will check en.wp by hand for the moment :/ )
Note that that category has 6000+-200 entries. (I just recounted it.)
I know I'm not the only admin that would do a lot more deleting if there was an easy way to check which ones were "easier" (non-easy = having to remove from use). And when you delete an image that no one's using, you're far less likely to incite angry mobs :)
They're still not easy because half the time the tag is wrong, or the uploader was never notified or something stupid... but they're easIER.
Brianna
Brianna Laugher schrieb:
Also, would it be possible to get a list of all images in [[Category:Unknown - June 2006]] that have not been edited since before June (these would be the images with malformed/depreciated 'unknown' tags)... and that are not used on any projects? (I will check en.wp by hand for the moment :/ )
Note that that category has 6000+-200 entries. (I just recounted it.)
I know I'm not the only admin that would do a lot more deleting if there was an easy way to check which ones were "easier" (non-easy = having to remove from use). And when you delete an image that no one's using, you're far less likely to incite angry mobs :)
They're still not easy because half the time the tag is wrong, or the uploader was never notified or something stupid... but they're easIER.
http://tools.wikimedia.de/~magnus/bad_old_ones.php
http://tools.wikimedia.de/~magnus/bad_old_ones.php?category=Unknown%20-%20Ju...
(the last one will take some time ;-), better use subcategories)
Magnus
They're still not easy because half the time the tag is wrong, or the uploader was never notified or something stupid... but they're easIER.
Note CatScan aborts after 1000 entries, so it won't be very useful for this evil huge category. (I think we will change it.)
Hot damn, Magnus...I thought I had found my idol and his name was Duesentrieb...now I question my faith ;)
Seriously, this is really nice. Really. Nice. We've got to package these better to the existing admins.
We are getting closer and closer to the easy life! (Tag, Warn, Delete) Those 12,000 don't feel so insurmountable now.
Did I mention "thank-you"? :)
(also, any time you feel like a challenge... most things on [[Commons:Bugs]] would like a workaround :))
Brianna
Brianna Laugher schrieb:
Note CatScan aborts after 1000 entries, so it won't be very useful for this evil huge category. (I think we will change it.)
So you'll have to fix the first 1000 befor you get the next batch :-)
Hot damn, Magnus...I thought I had found my idol and his name was Duesentrieb...now I question my faith ;)
Not sure about Duesentrieb, but I for one don't insist on monotheism ;-)
(also, any time you feel like a challenge... most things on [[Commons:Bugs]] would like a workaround :))
I've actually looked at a few of these, trying to find a fix for them in MediaWiki (better than workarounds;-) but the ones I tried are actually a lot more complicated than they seem...
Magnus
Magnus Manske wrote:
Brianna Laugher schrieb:
Also, would it be possible to get a list of all images in [[Category:Unknown - June 2006]] that have not been edited since before June (these would be the images with malformed/depreciated 'unknown' tags)... and that are not used on any projects? (I will check en.wp by hand for the moment :/ )
Note that that category has 6000+-200 entries. (I just recounted it.)
I know I'm not the only admin that would do a lot more deleting if there was an easy way to check which ones were "easier" (non-easy = having to remove from use). And when you delete an image that no one's using, you're far less likely to incite angry mobs :)
They're still not easy because half the time the tag is wrong, or the uploader was never notified or something stupid... but they're easIER.
Um... is it possible to get {{albumcover}}, {{bookcover}} and the like added to the list of "suitable deletion templates", so we can go through http://tools.wikimedia.de/~magnus/bad_old_ones.php?category=Against%20policy like a hot knife through butter? :)
Alphax (Wikipedia email) schrieb:
Um... is it possible to get {{albumcover}}, {{bookcover}} and the like added to the list of "suitable deletion templates", so we can go through http://tools.wikimedia.de/~magnus/bad_old_ones.php?category=Against%20policy like a hot knife through butter? :)
I've done even better: safe_delete now works for all requests coming from bad_old_ones. It even states the "source category" in the reason.
Magnus
I wonder if the recent activity on bugzilla:2099 was prompted by your partial solution? If we wait a little bit longer will this toolserver solution be unncessary? I hope so.
bad_old_ones should still be very useful though. After exams I hope I can spend a day or two to sit down with it.
Brianna
<snip>Wonderful things that I'm so excited happened </snip>
Magnus: A giant hug, a bottle of whatever you like to drink, and the firstborn child of someone who has children. You're awesome!
Could someone catch me up on exactly where we are with this? I've been gone since Saturday (my last post here was the last thing I did) and there has obviously been great progress while I was away. Could someone send me a summary of exactly where we are (especially, where and how to use Magnus' tool)? Offlist is fine, though it might be helpful to have an on-list post to point people to as well. I'd greatly appreciate it, and I promise, as soon as I know how to use what Magnus has coded, I'll delete those 1000 images I promised to nuke. :D
Essjay
On 14/06/06, Essjay essjaywiki@gmail.com wrote:
Could someone catch me up on exactly where we are with this? I've been gone since Saturday (my last post here was the last thing I did) and there has obviously been great progress while I was away. Could someone send me a summary of exactly where we are (especially, where and how to use Magnus' tool)?
This is how I would use them: "Hmm, I feel like deleting masses of files, I'll go to Category:Unknown and find a suitable subcategory..." e.g. http://commons.wikimedia.org/wiki/Category:Images_with_unknown_source_as_of_... (best to work with a medium-sized category, < 200 items)
so then I would hook that category name up to bad_old_ones to get a URL like http://tools.wikimedia.de/~magnus/bad_old_ones.php?category=Images%20with%20... (just copy and paste the cat name, it will convert the spaces to "%20") (at some stage we would hook this into the interface)
then this page gives us a lovely overview, so we can see the image itself, its summary, its usage, and quick links to delete or safe delete.
So the beauty of this, is that you can pick out "easy" cases to delete, i.e. images that aren't being used. (Unfortunately you still need to check en.wp by hand, can't wait until that's fixed.) You can also pick "obvious" copyvios (bandpromo shots, logos, etc) and it's also reasonably easy to see where the uploader has themselves tagged the image NLD (by choosing "License unknown" in the license selector at Special:Upload).
Just give the safe_delete a try, it's pretty straightforward I reckon. I also wouldn't use it for most of these cases, but if you're squeamish, go ahead. :)
BTW to Magnus: having a link directly to the en.wp page would be really handy (we have it as one of our 'extra tabs' now).
Brianna
On 14/06/06, Magnus Manske magnus.manske@web.de wrote:
Brianna Laugher schrieb:
BTW to Magnus: having a link directly to the en.wp page would be really handy (we have it as one of our 'extra tabs' now).
Sorry, what en page? On commons [[category:Unknown]], I see "tree" and "catscan" tabs. You mean one of these?
No, you will see an "en" tab on any Image: namespace page. It's just a link to the same image name at English Wikipedia, because that's the page we need to check to see if the image is being used in that wiki.
I also have another idea :) which will be fantastic for [[category:duplicate]] and [[category:incorrectly named]] (which is still not being used much, hm). images in these categories should have templates like this:
{{duplicate|Image:Otherimage.jpg}} {{bad name|Image:Bettername.jpg}} (or {{badname|...}}, there's a RDR)
So it would be awesome if we had something similar to bad_old_ones where it pulled up the duplicate and put them side by side (so you can confirm that they are the same image), and also gave you usage stats on the old one. That would make deleting them super easy and super safe.
I guess this is much more prone to user mistakes, but I had a quick look and all the ones I looked at had been done correctly.
The other thing is it would be good to have res stats (300x400) below each one so you could make sure they weren't nominating a thumbnail for deletion. And also a warning if the file types are different, since this template shouldn't really be used (sometimes it's ok, but if it's PNG > SVG usually they should use {{superseded}} which is not a deletion tag).
Just some thoughts to keep you on your toes ;)
Brianna
Brianna Laugher wrote:
No, you will see an "en" tab on any Image: namespace page. It's just a link to the same image name at English Wikipedia, because that's the page we need to check to see if the image is being used in that wiki.
Oh ... done :-)
I also have another idea :) which will be fantastic for [[category:duplicate]] and [[category:incorrectly named]] (which is still not being used much, hm).
Done :-) http://127.0.0.1/toys/commons_dupes.php?category=Duplicate
So far looking for templates "duplicate", "bad name", and "badname". Any others?
Just some thoughts to keep you on your toes ;)
Keep 'em coming! ;-)
Magnus
Magnus Manske píše v St 14. 06. 2006 v 21:01 +0200:
Is it a new trend to link localhost in mailinglist?
Zirland
Zirland wrote:
Magnus Manske píše v St 14. 06. 2006 v 21:01 +0200:
Is it a new trend to link localhost in mailinglist?
http://tools.wikimedia.de/~magnus/commons_dupes.php?category=Duplicate
for those who can't copy'n'paste ;-)
Magnus
Magnus Manske wrote:
Zirland wrote:
Magnus Manske píše v St 14. 06. 2006 v 21:01 +0200:
Is it a new trend to link localhost in mailinglist?
http://tools.wikimedia.de/~magnus/commons_dupes.php?category=Duplicate
for those who can't copy'n'paste ;-)
The toolserver is your localhost? Shiny...
OK, a small problem. I created [[template:delete assist]] to put on cat pages to automatically link to Bad Old Ones. I used {{PAGENAMEE}} for the cat name since they all have spaces. It converts them to underscores which BOO doesn't like. If I use {{PAGENAME}} the wikilink won't work properly because of the space. Is there any way to either convert the underscores to %20, or else let BOO recognise underscores as spaces?
Brianna
Brianna Laugher wrote:
OK, a small problem. I created [[template:delete assist]] to put on cat pages to automatically link to Bad Old Ones. I used {{PAGENAMEE}} for the cat name since they all have spaces. It converts them to underscores which BOO doesn't like. If I use {{PAGENAME}} the wikilink won't work properly because of the space. Is there any way to either convert the underscores to %20, or else let BOO recognise underscores as spaces?
Strange, [[template:delete assist]] worked for me just fine. Anyway, I've patched BOO just to be sure.
If it still makes problems, please give me an example page (category).
Magnus
On 6/14/06, Magnus Manske magnus.manske@web.de wrote:
Brianna Laugher wrote:
No, you will see an "en" tab on any Image: namespace page. It's just a link to the same image name at English Wikipedia, because that's the page we need to check to see if the image is being used in that wiki.
Oh ... done :-)
I also have another idea :) which will be fantastic for [[category:duplicate]] and [[category:incorrectly named]] (which is still not being used much, hm).
Done :-) http://127.0.0.1/toys/commons_dupes.php?category=Duplicate
So far looking for templates "duplicate", "bad name", and "badname". Any others?
Just some thoughts to keep you on your toes ;)
Keep 'em coming! ;-)
Fantastic.
A link to editing the description would also be useful -- that way if you check an image that looks suspicious, but turns out to be fine (or if it turns out to be used on en:wp), you can leave a comment to that effect with one fewer page+image load.
SJ
SJ wrote:
A link to editing the description would also be useful -- that way if you check an image that looks suspicious, but turns out to be fine (or if it turns out to be used on en:wp), you can leave a comment to that effect with one fewer page+image load.
Both bad_old_ones and commons_dupes now have edit links for the commons description page.
Magnus
I also have another idea :) which will be fantastic for [[category:duplicate]] and [[category:incorrectly named]] (which is still not being used much, hm).
Done :-) http://127.0.0.1/toys/commons_dupes.php?category=Duplicate
So far looking for templates "duplicate", "bad name", and "badname". Any others?
Hot damn! That is nice.
It also shows that people appear to be severely abusing the {{duplicate}} just to get their unfavourite PNGs deleted. >:|
Don't know what to do about that.
Brianna
Brianna Laugher wrote:
I tried it on http://commons.wikimedia.org/wiki/Image:Alcib%C3%ADades.jpg. Note the weird character in the filename.
Image:Alcib�ades.jpg is now stored as http://tools.wikimedia.de/~magnus/commons_images/6ae2eda2d5.jpg
You should copy the above line into a file on your local machine, as
it is the only way to recover the file at a later date!
Delete file on commons The image below should be identical to the one in the upper right
corner. If not, copying was unsuccessful, and you should not delete the file on the commons!
Neither of the images would load so I couldn't check if this was true.
OK, let's try one without a tricky name: Image:Barrett.jpg
Last edited 78 days ago... bingo!
The delete link worked fine.
OK, interesting.
- Check the image text for "delete me" templates; list of templates has
to be expanded, please tell me which to use
which ones do you have so far?
The deleting admin *has* to copy the new image url line, otherwise the image can never be found again! That way, the image is stored away from public eyes but still restorable through the deleting admin.
I think instead it should make an edit to the image page saying "BACK-UP COPY AT (url)". Because the image is about to be deleted anyway. If time proves the image should be undeleted, you can just undelete the image page, recover the URL and go from there. That seems much easier than storing the URL on my local machine for example. It would also save one manual step ;)
Agreed. That way we wouldn't have to worry about where the backup of each image is stored...
Brianna Laugher schrieb:
I tried it on http://commons.wikimedia.org/wiki/Image:Alcib%C3%ADades.jpg. Note the weird character in the filename.
The reason for this not working /may/ be that there's a text, but no image to copy ;-)
Anyway, that error is now caught and shown.
- Check the image text for "delete me" templates; list of templates has
to be expanded, please tell me which to use
which ones do you have so far?
You can see the current list by clicking on "source of this script" (even if you don't speak code;-)
The deleting admin *has* to copy the new image url line, otherwise the image can never be found again! That way, the image is stored away from public eyes but still restorable through the deleting admin.
I think instead it should make an edit to the image page saying "BACK-UP COPY AT (url)". Because the image is about to be deleted anyway. If time proves the image should be undeleted, you can just undelete the image page, recover the URL and go from there. That seems much easier than storing the URL on my local machine for example. It would also save one manual step ;)
I added a button which will open the edit page and append that message (also fill in the summary). You'll have to click it (and "Save") manually, though...
I didn't get one that was actually used, though, so I'm not sure what the check-usage part of the interface will look like.
One line for each wikimedia project that uses the image; in this line, project name, how many articles, talk pages, project pages, etc. use it For details (which pages actually use the image), click on "details" ;-)
Do TPTB approve of this use of the toolserver? If commons admins get into it, it seems like it could be reasonably intensive...
I posted it on toolserver-l, and so far, noone complained. Tim Starling made a good suggestion; I'll talk to him about this.
At some point, I'll have to really delete these images, no matter the disk space available. I'd assume it's safe to /really/ delete them if they were not resurrected within three month or so.
Magnus
Hi Magnus.
What was the purpose of having a 14 day time limit? Would there be any reason why there must be a time limit at all?
At currently, I can't use the tool on any image where anyone has made a small contrubtion (such as changing a category).
/ Fred
Magnus, is there any plan to deal with files that have multiple revisions? Two possibilities:
1) Save all the revisions to a randomized directory name (e.g. http://tools.wikimedia.de/~magnus/commons_images/ea6e6a5a74/) which is either listable with all the versions, or has an auto-generated index.html file that lists the versions
or
2) At least warn about this case, since currently the old versions aren't backed up.
David Benbennick wrote:
Magnus, is there any plan to deal with files that have multiple revisions? Two possibilities:
- Save all the revisions to a randomized directory name (e.g.
http://tools.wikimedia.de/~magnus/commons_images/ea6e6a5a74/) which is either listable with all the versions, or has an auto-generated index.html file that lists the versions
or
- At least warn about this case, since currently the old versions
aren't backed up.
There's a rather hackish solution to this:
1. Use safe delete and use the auto-edit thing to save the URL. 2. If there are any old revisions left, delete the latest revision only. 3. Goto 1.
Alphax (Wikipedia email) wrote:
David Benbennick wrote:
Magnus, is there any plan to deal with files that have multiple revisions? Two possibilities:
- Save all the revisions to a randomized directory name (e.g.
http://tools.wikimedia.de/~magnus/commons_images/ea6e6a5a74/) which is either listable with all the versions, or has an auto-generated index.html file that lists the versions
or
- At least warn about this case, since currently the old versions
aren't backed up.
There's a rather hackish solution to this:
- Use safe delete and use the auto-edit thing to save the URL.
- If there are any old revisions left, delete the latest revision only.
- Goto 1.
4) Wait a day or two, and click the regular "delete" tab.
-- brion vibber (brion @ pobox.com)
Brion Vibber schrieb:
- Wait a day or two, and click the regular "delete" tab.
Yes. The fully integrated solution is of course better than my hackish one (which is better than no solution, respectively;-)
I will not work on safe_delete anymore, and deactivate it once Brion's implementation is live. The safe-deleted files will still be around, though, at least for a few month.
I will continue development on bad_old_ones, of course. Until the stars are right again, that is ;-)
Magnus
Also (I am really warming up to this :)), it would be good to be able to override the "last edit 14 days ago" thing when necessary. For example when I see logos labelled *-self, I have a strong prejudice not to believe this and to delete on sight. If I could delete on sight with a backup handy just in the odd occasion ;) I'm wrong, it would be incredibly handy to calm the screaming hordes down.
Brianna
Brianna Laugher schrieb:
Also (I am really warming up to this :)), it would be good to be able to override the "last edit 14 days ago" thing when necessary. For example when I see logos labelled *-self, I have a strong prejudice not to believe this and to delete on sight. If I could delete on sight with a backup handy just in the odd occasion ;) I'm wrong, it would be incredibly handy to calm the screaming hordes down.
manually add "&days=7" to the URL for 7 days etc.
Magnus
On 09/06/06, Magnus Manske magnus.manske@web.de wrote:
Solution C: I would expect that most problematic uploads come from new users. Disable the creation of new users (or their ability to upload) while the total backlog is >X files (no pun intended).
I was thinking about this. What could work quite well, is if we had reviewed uploads. So you upload your first five files. Then you can't upload any more files until an admin reviews your uploads. Users stay in 'review mode' until they can upload 5 files in row without copyright concerns. If they upload 5 images fine then they can go into 'free mode', or 'unreviewed mode', or whatever you want to call it, which is what we have at the moment. It would also be great to be able to put users back INTO review mode!! Especially since we can't ban people from uploading only (bugzilla:4995), which is a great shame. This would be a good alternative.
This would be super nifty. Like all super nifty ideas, it's probably quite hard to solve technically, otherwise someone would've done it by now. :)
The only bad point is that it might lead people to create a new account every 5 images. Might not matter that much though...
Brianna
Brianna Laugher wrote:
[[Category:Unknown - March 2006]] has over 1000 items. These are items that were tagged as missing source information in March. They should be speedy deleted after 7 days.
This is now cleaned out entirely; as I finished each subcat, I deleted them, and then deleted the main category when it was completely finished. About 698 images, I believe.
Essjay