I've written a little tool [1] that shows file duplicates between a wikipedia and Commons, as well as internal duplicates. It runs of a static list created from the toolserver databases; currently, German and English are available. I will have to regenerate the data for other wikipedias and for updates manually.
But for now, there's ~29.000 dupes between en.wp and Commons, as well as ~8.500 between de.wp and Commons, so it might take you guys a while ;-)
A subset (default:25) images is selected randomly from the list, so you might run into images that already have {{NowCommons}}.
Cheers, Magnus
[1] http://toolserver.org/~magnus/cgi-bin/duplicate_images_across.pl?lang=en&...
I'm just looking where to buy the super capacitor ----- Original Message ----- From: "Magnus Manske" magnusmanske@googlemail.com To: "Wikimedia Commons Discussion List" commons-l@lists.wikimedia.org Sent: Sunday, September 07, 2008 7:27 PM Subject: [Commons-l] Cross-project dupes
I've written a little tool [1] that shows file duplicates between a wikipedia and Commons, as well as internal duplicates. It runs of a static list created from the toolserver databases; currently, German and English are available. I will have to regenerate the data for other wikipedias and for updates manually.
But for now, there's ~29.000 dupes between en.wp and Commons, as well as ~8.500 between de.wp and Commons, so it might take you guys a while ;-)
A subset (default:25) images is selected randomly from the list, so you might run into images that already have {{NowCommons}}.
Cheers, Magnus
[1] http://toolserver.org/~magnus/cgi-bin/duplicate_images_across.pl?lang=en&...
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Magnus Manske magnusmanske@googlemail.com wrote on sun, 7 sep 2008 18:27:14 +0100
I've written a little tool [1] that shows file duplicates between a wikipedia and Commons, as well as internal duplicates. It runs of a static list created from the toolserver databases; currently, German and English are available. I will have to regenerate the data for other wikipedias and for updates manually.
[1] http://toolserver.org/~magnus/cgi-bin/duplicate_images_across.pl?lang=en&...
Great stuff - as always.
What about a delete link for administrators (maybe with a predefined reason)?
A usage count for the images would also be nice :)
And last but not least: Buttons that generate a preview with {{NowCommons}} directly under the image ...
Sorry to bother you and thanks for this great tool.
Flo
On Tue, Sep 9, 2008 at 5:26 AM, Florian Straub flominator@gmx.net wrote:
Magnus Manske magnusmanske@googlemail.com wrote on sun, 7 sep 2008 18:27:14 +0100
I've written a little tool [1] that shows file duplicates between a wikipedia and Commons, as well as internal duplicates. It runs of a static list created from the toolserver databases; currently, German and English are available. I will have to regenerate the data for other wikipedias and for updates manually.
[1] http://toolserver.org/~magnus/cgi-bin/duplicate_images_across.pl?lang=en&...
Great stuff - as always.
What about a delete link for administrators (maybe with a predefined reason)?
A usage count for the images would also be nice :)
And last but not least: Buttons that generate a preview with {{NowCommons}} directly under the image ...
Sorry to bother you and thanks for this great tool.
And, could this tool support other projects?
Many other projects have copied PD images from en.wp, and those images have since been moved to Commons. It would be a great help to be able to quickly go around the smaller projects and assist them to use the images now on Commons, and remove the duplicates. I was just now doing this over on en.wv, and was thinking that I need a tool to help find the dups!
Cheers, John
And, could this tool support other projects?
I wrote http://toolserver.org/~multichill/nowcommons.php a while ago, less colorful, but should work on every wikipedia.
Many other projects have copied PD images from en.wp, and those images have since been moved to Commons. It would be a great help to be able to quickly go around the smaller projects and assist them to use the images now on Commons, and remove the duplicates. I was just now doing this over on en.wv, and was thinking that I need a tool to help find the dups!
At the nl wikipedia i created a template a while back to mark images moved from other wikipedia's, but not yet transfered to Commons. The template makes the images end up in http://nl.wikipedia.org/wiki/Categorie:Wikipedia:Afbeelding_afkomstig_van_ee... This can be very useful to mark images and later transfer the source images to Commons.
Maarten
On Wed, Sep 10, 2008 at 2:36 AM, Maarten Dammers maarten@mdammers.nl wrote:
And, could this tool support other projects?
I wrote http://toolserver.org/~multichill/nowcommons.php a while ago, less colorful, but should work on every wikipedia.
Great. I have tried it on Indonesian Wikipedia; it works well.
Would it be possible to also support other projects? I am especially interested in English Wikiversity at the moment, as I am doing an image cleanup there.
Many other projects have copied PD images from en.wp, and those images have since been moved to Commons. It would be a great help to be able to quickly go around the smaller projects and assist them to use the images now on Commons, and remove the duplicates. I was just now doing this over on en.wv, and was thinking that I need a tool to help find the dups!
At the nl wikipedia i created a template a while back to mark images moved from other wikipedia's, but not yet transfered to Commons. The template makes the images end up in http://nl.wikipedia.org/wiki/Categorie:Wikipedia:Afbeelding_afkomstig_van_ee... This can be very useful to mark images and later transfer the source images to Commons.
Good idea. I'll start implementing that to help me catalog the images on en.wv so that the oldest copy of the image is copied to commons.
-- John
After some trouble with the tool last night, I've altered a few internal things. As a byproduct, pure Commons duplicates won't show up anymore (there are better tools for this, including my own;-).
Also, I now have en, de, and fr wikipedia (use as 'lang=en'), and en and de wikiversity (use 'lang=en_wv' or de_wv).
There's now a counter for usage of a local image in article namespace. Also, there are delete links where appropriate.
And, the entire thing should be a wee bit faster now ;-)
Cheers, Magnus
On Tue, Sep 9, 2008 at 4:36 AM, John Vandenberg jayvdb@gmail.com wrote:
On Tue, Sep 9, 2008 at 5:26 AM, Florian Straub flominator@gmx.net wrote:
Magnus Manske magnusmanske@googlemail.com wrote on sun, 7 sep 2008 18:27:14 +0100
I've written a little tool [1] that shows file duplicates between a wikipedia and Commons, as well as internal duplicates. It runs of a static list created from the toolserver databases; currently, German and English are available. I will have to regenerate the data for other wikipedias and for updates manually.
[1] http://toolserver.org/~magnus/cgi-bin/duplicate_images_across.pl?lang=en&...
Great stuff - as always.
What about a delete link for administrators (maybe with a predefined reason)?
A usage count for the images would also be nice :)
And last but not least: Buttons that generate a preview with {{NowCommons}} directly under the image ...
Sorry to bother you and thanks for this great tool.
And, could this tool support other projects?
Many other projects have copied PD images from en.wp, and those images have since been moved to Commons. It would be a great help to be able to quickly go around the smaller projects and assist them to use the images now on Commons, and remove the duplicates. I was just now doing this over on en.wv, and was thinking that I need a tool to help find the dups!
Cheers, John
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l