Hi all,
As most of you probably know, I wrote Flickr upload bot back in May 2007 because there was a lot demand for uploading free images from Flickr to Commons. And apparently people find it useful, since as of September 2010, over 80k images have been uploaded via this bot. In addition over 50k images have been uploaded via a similar bot by Magnus Manske.
Unfortunately as you may know, every other day those tools break (mine more than Magnus'). Both have an annoying authentication mechanism, which requires you to do extra stuff to be able to upload (either post a token to a file page, or use TUSC). Both problems would be solved if there was a MediaWiki extension to handle this task.
I eventually plan to write a MediaWiki extension that does such a thing and get it enabled on Commons. Therefore, I need to know what you like and dislike about those tools, so that I can take this feedback into account when writing this extension. Don't expect to see something in the short term though, but I hope that in the mid-long term we will have such an extension on Commons.
-- Bryan
On Sat, Oct 30, 2010 at 9:29 PM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
Hi all,
As most of you probably know, I wrote Flickr upload bot back in May 2007 because there was a lot demand for uploading free images from Flickr to Commons. And apparently people find it useful, since as of September 2010, over 80k images have been uploaded via this bot. In addition over 50k images have been uploaded via a similar bot by Magnus Manske.
Unfortunately as you may know, every other day those tools break (mine more than Magnus'). Both have an annoying authentication mechanism, which requires you to do extra stuff to be able to upload (either post a token to a file page, or use TUSC). Both problems would be solved if there was a MediaWiki extension to handle this task.
I eventually plan to write a MediaWiki extension that does such a thing and get it enabled on Commons. Therefore, I need to know what you like and dislike about those tools, so that I can take this feedback into account when writing this extension. Don't expect to see something in the short term though, but I hope that in the mid-long term we will have such an extension on Commons.
An excellent idea. I would like to add two suggestions, since they are within the scope of such a framework, and I would really like to see these done in a single, elegant extension.
First, there are other sites besides flickr that have license-compatible files we can use. Flickr may be the largest today, but there are many specialized ones like geograph.org.uk (pictures of places) and GIMP-SAVVY, and general ones like Picasa, Ipernity, and pictures owned by the Brazilian government. I am supporting search for those and more at [1] (>670K uses), but upload for most of them is currently manual (download, upload). It should be comparatively simple to write a more generic "transfer" parent class, which then would have derived classes for each of these sites; a simple method (e.g. for an image page on flickr, give me the URL of the most high-res file) might be sufficient for specialization.
Second, the special case of transfering from other MediaWiki sites. This includes all Wiki(m|p)edia projects, as I do in [2] with >400K uses (and [3], when it works), but also WikiTravel, and basically any other MediaWiki installation where a license can be determined. While it might seem to be easy to implement this, as we are more familiar with the site behaviour, there is no API for image metadata in MediaWiki, and transcoding the wikitext correctly, from all projects and languages, can be a real b***h, as countless more-or-less botched transfers from my bot show. Alternatives would be parsing the HTML (lossy), or putting more weight on the user to check for correctness.
Even if you do not chose to implement any of these transfer options initially, I believe you should code with these as further additions in mind. IMHO it would be a real shame to "waste" such an opportunity on flickr alone.
Cheers, Magnus
[1] http://toolserver.org/~magnus/fist.php [2] http://toolserver.org/~magnus/commonshelper.php [3] http://toolserver.org/~commonshelper2/?language=en&project=wikipedia&...
On Sat, Oct 30, 2010 at 11:39 PM, Magnus Manske magnusmanske@googlemail.com wrote:
It should be comparatively simple to write a more generic "transfer" parent class, which then would have derived classes for each of these sites; a simple method (e.g. for an image page on flickr, give me the URL of the most high-res file) might be sufficient for specialization.
Good idea, I will take that into account
Second, the special case of transfering from other MediaWiki sites. This includes all Wiki(m|p)edia projects, as I do in [2] with >400K uses (and [3], when it works), but also WikiTravel, and basically any other MediaWiki installation where a license can be determined. While it might seem to be easy to implement this, as we are more familiar with the site behaviour,
there is no API for image metadata in MediaWiki,
There's a bug for that. https://bugzilla.wikimedia.org/show_bug.cgi?id=25624
and transcoding the wikitext correctly, from all projects and languages, can be a real b***h, as countless more-or-less botched transfers from my bot show. Alternatives would be parsing the HTML (lossy), or putting more weight on the user to check for correctness.
Even if you do not chose to implement any of these transfer options initially, I believe you should code with these as further additions in mind. IMHO it would be a real shame to "waste" such an opportunity on flickr alone.
I agree, but whenever I'm going to do this, I am going to make a generic interface for sites that can output metadata in a structured way. This means that Flickr, Picassa and others can be easily supported, but that MediaWiki will only be supported when the aforementioned metadata is supported by MediaWiki. I believe that eventually MediaWiki will support such features, but we will by then be well into the 1.20s.
Cheers, Magnus
[1] http://toolserver.org/~magnus/fist.php [2] http://toolserver.org/~magnus/commonshelper.php [3] http://toolserver.org/~commonshelper2/?language=en&project=wikipedia&...
Reagards, Bryan
On Sat, Oct 30, 2010 at 10:51 PM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
I agree, but whenever I'm going to do this, I am going to make a generic interface for sites that can output metadata in a structured way. This means that Flickr, Picassa and others can be easily supported, but that MediaWiki will only be supported when the aforementioned metadata is supported by MediaWiki. I believe that eventually MediaWiki will support such features, but we will by then be well into the 1.20s.
That's a very acceptable compromise (and yes, it will probably be 1.20s and 2020s...)
There is a possibility for an intermediate hack, which would keep your extension clean from the parsing madness and still beat the official API solution. A while back, Brianna suggested to me to write an API that is compatible to Flickr but uses Commons instead; the idea being that all tools which can talk to Flickr could then instantly talk to Commons as well, just by changing the base API URL. I started this [1], but then abandoned it (probably under an avalanche of other urgent stuff).
While this would still be a good idea, a similar API (flickr-compatible or not) could be set up for non-Commons (WikiMedia, possibly other) projects on the toolserver rather quickly (compared to the official MediaWiki-blessed way); your extension could then just ask the toolserver API for the structured data.
Cheers, Magnus
Hey Bryan, Did you ever make any progress on getting a Flickr uploading extension going? If not, I would like to go ahead and propose it as a project for the WMF to work on.
Ryan Kaldari
On 10/30/10 2:51 PM, Bryan Tong Minh wrote:
On Sat, Oct 30, 2010 at 11:39 PM, Magnus Manske magnusmanske@googlemail.com wrote:
It should be comparatively simple to write a more generic "transfer" parent class, which then would have derived classes for each of these sites; a simple method (e.g. for an image page on flickr, give me the URL of the most high-res file) might be sufficient for specialization.
Good idea, I will take that into account
Second, the special case of transfering from other MediaWiki sites. This includes all Wiki(m|p)edia projects, as I do in [2] with>400K uses (and [3], when it works), but also WikiTravel, and basically any other MediaWiki installation where a license can be determined. While it might seem to be easy to implement this, as we are more familiar with the site behaviour,
there is no API for image metadata in MediaWiki,
There's a bug for that.https://bugzilla.wikimedia.org/show_bug.cgi?id=25624
and transcoding the wikitext correctly, from all projects and languages, can be a real b***h, as countless more-or-less botched transfers from my bot show. Alternatives would be parsing the HTML (lossy), or putting more weight on the user to check for correctness.
Even if you do not chose to implement any of these transfer options initially, I believe you should code with these as further additions in mind. IMHO it would be a real shame to "waste" such an opportunity on flickr alone.
I agree, but whenever I'm going to do this, I am going to make a generic interface for sites that can output metadata in a structured way. This means that Flickr, Picassa and others can be easily supported, but that MediaWiki will only be supported when the aforementioned metadata is supported by MediaWiki. I believe that eventually MediaWiki will support such features, but we will by then be well into the 1.20s.
Cheers, Magnus
[1] http://toolserver.org/~magnus/fist.php [2] http://toolserver.org/~magnus/commonshelper.php [3] http://toolserver.org/~commonshelper2/?language=en&project=wikipedia&...
Reagards, Bryan
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
On Tue, Jan 4, 2011 at 8:37 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
Hey Bryan, Did you ever make any progress on getting a Flickr uploading extension going? If not, I would like to go ahead and propose it as a project for the WMF to work on.
Not yet, perhaps February or March. I think it would be good to have it as WMF project, as long the community stays involved with it. So make sure that your requirements are public and that the community is involved evaluating them and make design decisions public so that volunteer developers can collaborate on the development.
Bryan
As a proponent of agile development, I would actually suggest not worrying about non-Flickr transfers for a first iteration. Most of the non-Flickr cases can be adequately supported by bots and manual uploading in the meantime. If we could get a working Flickr-transfer extension enabled on Commons, that would be a huge step forward and then it could be refactored to support interwiki or generalized file transfer (and to address feedback from the initial version).
To answer your question, the things I like best about the current tools are: * Automatic license verification * Being able to use a variety of different URLs and the tool being smart enough to pull the maximum resolution version regardless * Automatically pulling descriptions/metadata
It would also be nice to be able to pull an entire set/feed/pool in one go, but that should be for version 2.
Ryan Kaldari
On 10/30/10 1:29 PM, Bryan Tong Minh wrote:
Hi all,
As most of you probably know, I wrote Flickr upload bot back in May 2007 because there was a lot demand for uploading free images from Flickr to Commons. And apparently people find it useful, since as of September 2010, over 80k images have been uploaded via this bot. In addition over 50k images have been uploaded via a similar bot by Magnus Manske.
Unfortunately as you may know, every other day those tools break (mine more than Magnus'). Both have an annoying authentication mechanism, which requires you to do extra stuff to be able to upload (either post a token to a file page, or use TUSC). Both problems would be solved if there was a MediaWiki extension to handle this task.
I eventually plan to write a MediaWiki extension that does such a thing and get it enabled on Commons. Therefore, I need to know what you like and dislike about those tools, so that I can take this feedback into account when writing this extension. Don't expect to see something in the short term though, but I hope that in the mid-long term we will have such an extension on Commons.
-- Bryan
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
On Thu, Nov 4, 2010 at 12:15 AM, Ryan Kaldari rkaldari@wikimedia.org wrote:
As a proponent of agile development, I would actually suggest not worrying about non-Flickr transfers for a first iteration. Most of the non-Flickr cases can be adequately supported by bots and manual uploading in the meantime. If we could get a working Flickr-transfer extension enabled on Commons, that would be a huge step forward and then it could be refactored to support interwiki or generalized file transfer (and to address feedback from the initial version).
While I fully agree that Flickr transfer would be the most useful and has priority, planning more generic functionality in early design (e.g. common function to transfer file from generic URL; abstracted method to determine highest resolution URL from identifier, etc.) would cost very little now but have a huge payoff later. I've seen the "we can refactor this later"-approach blow up too often (in terms of development investment). Yes, even for Java/Eclipse...
To answer your question, the things I like best about the current tools are:
- Automatic license verification
- Being able to use a variety of different URLs and the tool being smart
enough to pull the maximum resolution version regardless
- Automatically pulling descriptions/metadata
It would also be nice to be able to pull an entire set/feed/pool in one go, but that should be for version 2.
Shameless plug: http://toolserver.org/~magnus/flickr_mass.php
Cheers, Magnus
The AMW has a mode to import assets from fliker directly during the article editing process, it does it best to match the existing description templates, uses the highest quality assets, only searches properly licensed content etc.
Maybe we could address bug 20512: ( it would only work in sync mode until the code is brought up-to trunk and synced with Bryan's updated jobqueue version, but 30 seconds should be enough time to import images for now ) https://bugzilla.wikimedia.org/show_bug.cgi?id=20512
--michael
On 10/30/2010 01:29 PM, Bryan Tong Minh wrote:
Hi all,
As most of you probably know, I wrote Flickr upload bot back in May 2007 because there was a lot demand for uploading free images from Flickr to Commons. And apparently people find it useful, since as of September 2010, over 80k images have been uploaded via this bot. In addition over 50k images have been uploaded via a similar bot by Magnus Manske.
Unfortunately as you may know, every other day those tools break (mine more than Magnus'). Both have an annoying authentication mechanism, which requires you to do extra stuff to be able to upload (either post a token to a file page, or use TUSC). Both problems would be solved if there was a MediaWiki extension to handle this task.
I eventually plan to write a MediaWiki extension that does such a thing and get it enabled on Commons. Therefore, I need to know what you like and dislike about those tools, so that I can take this feedback into account when writing this extension. Don't expect to see something in the short term though, but I hope that in the mid-long term we will have such an extension on Commons.
-- Bryan
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
clicked send to fast... AMW refers to Add Media Wizard: http://www.mediawiki.org/wiki/Extension:Add_Media_Wizard and its Flickr not fliker ;)
--michael
On 11/03/2010 06:04 PM, Michael Dale wrote:
The AMW has a mode to import assets from fliker directly during the article editing process, it does it best to match the existing description templates, uses the highest quality assets, only searches properly licensed content etc.
Maybe we could address bug 20512: ( it would only work in sync mode until the code is brought up-to trunk and synced with Bryan's updated jobqueue version, but 30 seconds should be enough time to import images for now ) https://bugzilla.wikimedia.org/show_bug.cgi?id=20512
--michael
On 10/30/2010 01:29 PM, Bryan Tong Minh wrote:
Hi all,
As most of you probably know, I wrote Flickr upload bot back in May 2007 because there was a lot demand for uploading free images from Flickr to Commons. And apparently people find it useful, since as of September 2010, over 80k images have been uploaded via this bot. In addition over 50k images have been uploaded via a similar bot by Magnus Manske.
Unfortunately as you may know, every other day those tools break (mine more than Magnus'). Both have an annoying authentication mechanism, which requires you to do extra stuff to be able to upload (either post a token to a file page, or use TUSC). Both problems would be solved if there was a MediaWiki extension to handle this task.
I eventually plan to write a MediaWiki extension that does such a thing and get it enabled on Commons. Therefore, I need to know what you like and dislike about those tools, so that I can take this feedback into account when writing this extension. Don't expect to see something in the short term though, but I hope that in the mid-long term we will have such an extension on Commons.
-- Bryan
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l