Dear devs,
This is a list of things that I consider are tech/dev priorities for Wikimedia Commons. Thanks for your attention.
(0. SUL)
1. Improved search. Dumping Mayflower in there would be a great start: http://tools.wikimedia.de/~tangotango/mayflower/ I suggest replacing the current search box with a Mayflower search box, and moving the current text search box below the toolbox.
2. Backup. No image dumps, still? This is quite worrying. To say the least.
3. Multilingual tagging system (ie categories). I would prefer a system where a category has a 'preferred' or 'display' name able to be defined in each language, but also alternate forms that can be used to access the category. So a category' might have an alternate name in English might be [[category:Sydney, Australia]]. Using this to tag an image will work. Also typing it in a search box. But at the bottom of the image they will see a link to the category using the display name [[category:Sydney]]. Users will see the display name of the category according to their language settings. If nothing else this is the most important part. We are extremely hampered by not being able to use multilingual names for categories. It's a blow for a site that wants to be multilingual and serve all languages equally, to have no choice but to say 'sorry, you have to do this part in English'.
3. Structured data. This is basically needed to feed into the search engine. I don't know how Mayflower does it but I suppose it's not ideal. At the moment we use a [[template:information]] but it's not hugely well suited to it. Fields include Description, Source, Permission, Date, License, Geocode, Other_versions, etc etc etc.
4. Move/rename images function AKA image redirect capability As Tim says, "How hard can it be?" :) This is only an issue because it is such a common request.
5. Rating system I would like a simple youtube-style 5star rating thing for image pages, for logged-in users. More importantly it should be possible to sort search results (and maybe categories?) according to rating. This is important because, if you think about Wikipedia, any search query likely has only half a dozen highly relevant results at most. Most I would wager only have one. But on Commons, querying over individual images, a reasonable search query can and often does have dozens or even hundreds of results. The best way to sift through them would be by ratings. There is only so much manual rating we can do (and we are doing it). As Commons grows this is only going to get worse. This may not seem like a priority simply because we don't have the functionality now, but I have a strong feeling this is the right direction.
6. Improved category handling If the search was improved enough, it might make categories (and galleries) more or less irrelevant. I can only pray. However for the moment categories are often used for navigation. The bug of not putting all subcats on the first page is a frequent problem. The problem of not seeing the total number of items in the category is a problem. The lack of ability to sort a category by, e.g. date item was added to it, or date item was uploaded, is a problem. The lack of ability to 'auto-flatten' a category is a problem.
7. Multilingual support. There is one pretty big problem with how screwed up the interface is for RTL languages. It's very weird for those users. The lack of ability for anonymous users to choose a language and have it 'stick' is also limiting. As I mentioned, lack of ability to display category names in native languages is a problem. Lack of automatically translating templates is a problem (we would like them to work as MediaWiki messages do, more or less). We seriously would like to be a multilingual wiki. But unfortunately I can't really say we are at the moment.
------------------------------------------------------------------ OK those things are all my big-ticket items. These are some lower priority ones.
8. Playback support. Been discussed a little. Still needs major improving.
9. RSS and InstantCommons, 'embedding' feature These items are about making our content accessible and keeping the tie to our site. Keeping the 'tie' gives us some more control, which is necessary since I suspect the lifespan of an image is pretty unpredictable. Maybe when image renames work it won't be so bad. It would be nice for admins to be able to define RSS feeds, also to have RSS feeds of items added to a category. InstantCommons: Wikitravel, Wikia, Wikihow, etc etc etc etc etc would love to be able to easily access our content. We can benefit from their participation. I'm not sure where this is at now. 'Embedding' feature - I'm just thinking about the success of the Flickr API.
10. Global native MW Checkusage, CommonsTicker functionality. Checkusage is an utterly vital tool for us. Since the toolserver seems vastly improved of late this has dropped on my list. But I don't think we should have such critical tools on the toolserver only. When toolserver dies it's as if RC stopped existing for vandalfighters: really, what can you do... CommonsTicker could be an extension that each local wiki could alter to their preferences, in terms of where and when they want notification about images they are using. At the moment it's a bot that can leave messages on the talk pages of articles where an image is used, that is in danger of being deleted on Commons.
11. ImportFreeImages extension (importing images from Flickr). This was high priority six months ago when I opened the bug request (8854). Since nothing happened, we developed community and bot processes to solve the problem of verifying Flickr licenses -- the same problem this extension would solve.
------------------------------------------------------ Lowest priority -- things which we currently do with Javascript or external tools, that would be nice to have as native functions.
12. Ability to move images from another Wikimedia wiki to Commons 13. User upload gallery function 14. Bulk upload function 15. Auto-resize galleries according to window size. http://commons.wikimedia.org/wiki/MediaWiki_talk:ResizeGalleries.js
16. Category and gallery 'previews' Example: http://commons.wikimedia.org/wiki/Image:SWB9470_Puetzstrasse.jpg?withJS=Medi... http://commons.wikimedia.org/wiki/MediaWiki_talk:Gallerypreview.js This has improved usability so much even for me, an experienced user. It just makes you so much more likely to explore. And that is fantastic for how people should ideally use our site.
17. "Fotonotes". http://commons.wikimedia.org/wiki/MediaWiki:ImageBoxes.js (a start, but I think not totally user friendly yet) Ability to put user annotations directly on an image. Another one of those things that once we start using I'm sure we won't be able to do without.
18. Special:Upload flexibility Well, we are satisfied with our uselang hack so far, but if it kills the cache and our toy gets taken away, that won't be so cool. So that's your call I suppose (well, so is everything :)).
BTW I have a feeling these last ones are relatively easy to do, especially since we have existing tools. So they might be nice projects for beginning wannabe devs, or experienced devs who would like a quick break from something more painful, etc. :)
Also some bugs are listed here: http://commons.wikimedia.org/wiki/Commons:Bugs (some are parts of above mentioned issues, some are not)
regards Brianna user:pfctdayelise
PS> I believe Wikisource also has some welldefined problems they believe are priorities for them. Last I know is this one: http://lists.wikimedia.org/pipermail/wikisource-l/2007-June/000294.html BirgitteSB should be a good contact for a more up to date list.
On Sat, 2007-08-11 at 02:02 +1000, Brianna Laugher wrote:
- Backup.
No image dumps, still? This is quite worrying. To say the least.
It would be faster to drive to the datacenter with a stack of hard drives (or at least 1x500gb drive) and have someone just copy the images over to the drive directly, than it would be to try to tar them up and have people retrieve them over http.
I don't know how long it would take to fetch a 300+ gigabyte file over http, but I certainly wouldn't want to do it... not without some distributed method of sharing the packets and bandwidth.
On 11/08/07, David A. Desrosiers desrod@gnu-designs.com wrote:
On Sat, 2007-08-11 at 02:02 +1000, Brianna Laugher wrote:
- Backup.
No image dumps, still? This is quite worrying. To say the least.
It would be faster to drive to the datacenter with a stack of hard drives (or at least 1x500gb drive) and have someone just copy the images over to the drive directly, than it would be to try to tar them up and have people retrieve them over http.
I don't know how long it would take to fetch a 300+ gigabyte file over http, but I certainly wouldn't want to do it... not without some distributed method of sharing the packets and bandwidth.
OK, my bad wording. The important thing is the backup. I would really like someone to do that manual backup. I don't care about if it exists on the web. I just care that it exists, somewhere (or some few places), for sure. So, you know. If all that's needed is for someone to rent a car, then let's do that. :)
cheers Brianna
On 10/08/07, Brianna Laugher brianna.laugher@gmail.com wrote:
OK, my bad wording. The important thing is the backup. I would really like someone to do that manual backup. I don't care about if it exists on the web. I just care that it exists, somewhere (or some few places), for sure. So, you know. If all that's needed is for someone to rent a car, then let's do that. :)
It's worth pointing out here that if Wikimedia isn't backing up *all content* in an appropriate fashion, then this needs to be addressed, and the blame lies with the Foundation itself for not establishing or confirming that such things are addressed, and *not* with the existing technical team.
Rob Church
On 8/10/07, Rob Church robchur@gmail.com wrote:
It's worth pointing out here that if Wikimedia isn't backing up *all content* in an appropriate fashion, then this needs to be addressed, and the blame lies with the Foundation itself for not establishing or confirming that such things are addressed, and *not* with the existing technical team.
I don't think Brianna meant to blame anybody.
Bryan
On 10/08/07, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
I don't think Brianna meant to blame anybody.
Of course not, that's not what I meant in the slightest. I apologise if this was how it was perceived.
Rob Church
Hoi, When disaster strikes, it does not matter really who is to blame. What matters is how to get things back into a working state. This will take both the Foundation and the technical team.
Given that some members of the technical team represent the technical ability of the Wikimedia Foundation, the notion of who is to blame is not a productive one anyway. I am sure that everyone will be relieved when there is proper backup and recovery strategy in the first place. When there is a workable strategy it would be nice if this is communicated because it will make the people involved in Commons less anxious.
Thanks, GerardM
On 8/10/07, Rob Church robchur@gmail.com wrote:
On 10/08/07, Brianna Laugher brianna.laugher@gmail.com wrote:
OK, my bad wording. The important thing is the backup. I would really like someone to do that manual backup. I don't care about if it exists on the web. I just care that it exists, somewhere (or some few places), for sure. So, you know. If all that's needed is for someone to rent a car, then let's do that. :)
It's worth pointing out here that if Wikimedia isn't backing up *all content* in an appropriate fashion, then this needs to be addressed, and the blame lies with the Foundation itself for not establishing or confirming that such things are addressed, and *not* with the existing technical team.
Rob Church
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 10/08/07, GerardM gerard.meijssen@gmail.com wrote:
When disaster strikes, it does not matter really who is to blame. What matters is how to get things back into a working state. This will take both the Foundation and the technical team.
It'll matter to you, though, Gerard, won't it? And it'll matter to all the people who've contributed content to any project, and to anyone who's proud of what we've achieved, if any of that is lost. And rightly so. If something goes wrong, human nature is to find someone to blame.
Given that some members of the technical team represent the technical ability of the Wikimedia Foundation, the notion of who is to blame is not a productive one anyway. I am sure that everyone will be relieved when there is proper backup and recovery strategy in the first place. When there is a workable strategy it would be nice if this is communicated because it will make the people involved in Commons less anxious.
Iff we don't have a formal backup plan established for all our content, then I blame the Foundation, and by that, I mean the Board. And that's an iff, because for all I know, we could have 100% backup in place, although this isn't clear from existing documentation, and Brianna's email calls it into question.
Rob Church
Hoi, Yes, it would matter to me.. I do not really want to pronounce all those Dutch words again. With Shtooka I can do it ten times quicker than with Audacity butI would hate to do it again. I would certainly grumble and I do not know if I would pronounce everything again..
The Foundation is responsible for the backup and recovery procedures, this is sensible. It is for this reason that I copy Sue in. It is her job to look after these things and be responsible. Given that it is a relevant issue, I am sure she will give it the priority it requires and ensure that a risk analysis is done that will give us some peace of mind.
Thanks, GerardM
On 8/10/07, Rob Church robchur@gmail.com wrote:
On 10/08/07, GerardM gerard.meijssen@gmail.com wrote:
When disaster strikes, it does not matter really who is to blame. What matters is how to get things back into a working state. This will take
both
the Foundation and the technical team.
It'll matter to you, though, Gerard, won't it? And it'll matter to all the people who've contributed content to any project, and to anyone who's proud of what we've achieved, if any of that is lost. And rightly so. If something goes wrong, human nature is to find someone to blame.
Given that some members of the technical team represent the technical ability of the Wikimedia Foundation, the notion of who is to blame is
not a
productive one anyway. I am sure that everyone will be relieved when
there
is proper backup and recovery strategy in the first place. When there is
a
workable strategy it would be nice if this is communicated because it
will
make the people involved in Commons less anxious.
Iff we don't have a formal backup plan established for all our content, then I blame the Foundation, and by that, I mean the Board. And that's an iff, because for all I know, we could have 100% backup in place, although this isn't clear from existing documentation, and Brianna's email calls it into question.
Rob Church
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 8/10/07, Brianna Laugher brianna.laugher@gmail.com wrote:
On 11/08/07, David A. Desrosiers desrod@gnu-designs.com wrote:
On Sat, 2007-08-11 at 02:02 +1000, Brianna Laugher wrote:
- Backup.
No image dumps, still? This is quite worrying. To say the least.
It would be faster to drive to the datacenter with a stack of hard drives (or at least 1x500gb drive) and have someone just copy the images over to the drive directly, than it would be to try to tar them up and have people retrieve them over http.
I don't know how long it would take to fetch a 300+ gigabyte file over http, but I certainly wouldn't want to do it... not without some distributed method of sharing the packets and bandwidth.
OK, my bad wording. The important thing is the backup. I would really like someone to do that manual backup. I don't care about if it exists on the web. I just care that it exists, somewhere (or some few places), for sure. So, you know. If all that's needed is for someone to rent a car, then let's do that. :)
A tool has been created called Wikix, it downloads all images from a wiki. I believe it has been run once or twice on the commons, but that is not often enough. Direct dumps would be great but in the mean time this could be used as a stop-gap measure. I would run it regularly and distribute the files but 1) I can't compile it for some reason, and 2) My hard drive isn't big enough.
cheers
Brianna
-- They've just been waiting in a mountain for the right moment: http://modernthings.org/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
-UH.
On Fri, 2007-08-10 at 20:59 +0100, Minute Electron wrote:
Direct dumps would be great but in the mean time this could be used as a stop-gap measure. I would run it regularly and distribute the files but 1) I can't compile it for some reason, and 2) My hard drive isn't big enough.
I have plenty of space, and I have the CPU and bandwidth to do it, as well as the public torrent trackers to host it if necessary. Where is this Wikix tool?
David A. Desrosiers wrote:
On Fri, 2007-08-10 at 20:59 +0100, Minute Electron wrote:
Direct dumps would be great but in the mean time this could be used as a stop-gap measure. I would run it regularly and distribute the files but 1) I can't compile it for some reason, and 2) My hard drive isn't big enough.
I have plenty of space, and I have the CPU and bandwidth to do it, as well as the public torrent trackers to host it if necessary. Where is this Wikix tool?
It's probably better to use rsync, that's what we usually use for image backups. It's quite robust, and efficient if used properly. The only trouble with it is the server load -- it would quickly overload the backend if we made it available for public use. Maybe we could set up an rsyncd instance limited by client IP.
-- Tim Starling
On Sat, 2007-08-11 at 07:44 +0100, Tim Starling wrote:
It's probably better to use rsync, that's what we usually use for image backups. It's quite robust, and efficient if used properly. The only trouble with it is the server load -- it would quickly overload the backend if we made it available for public use. Maybe we could set up an rsyncd instance limited by client IP.
robchurch/brion/avar/JeLuF/etc. and I talked about this on irc a few of years ago in the context of pulling/sharing the database dumps, and the determination at that time was made that it was more efficient to use lighttpd on the front end and use http than to set up rsyncd and use that.
We even discussed how to add zsync support (rsync over http), to try to bridge those two pieces, and I don't think it was ever persued beyond that. gzip even has a flag for --rsyncable, so you can optimize the compressed dumps for this.
I'd love to see an rsyncd set up for mirroring the images on Commons and the other projects. Not only would that allow the mirrors to stay in lockstep with the current public wiki (as images are changed, renamed, removed.. the mirrors would retain those changes vs. having a wget or torrent-o-images), but it would only refetch that which has changed.
I'm a big fan of rsync, and use it here for mirroring Gutenberg, CPAN, etc. so adding Wikimedia images (or even db dumps again?) would be great.
Perhaps designating a few "1st Tier" mirrors, through which the torrents, http mirrors, etc. can be designated would be a good start. You have rsyncd on the w.m.o. servers locked to a specific, known set of IPs.
Those IPs fetch the data across rsync, and then set up their own public http/rsync/zsync/etc. mirrors of their own, and you point "2nd through Nth Tier" mirrors at those first-level mirrors, instead of your own public servers.
Definitely worth exploring rsync though, in this case.
Tim Starling wrote:
It's probably better to use rsync, that's what we usually use for image backups. It's quite robust, and efficient if used properly. The only trouble with it is the server load -- it would quickly overload the backend if we made it available for public use. Maybe we could set up an rsyncd instance limited by client IP.
Note that rsync takes too long to build the file list for the full set of files; you have to break it up by smaller directories to get a reliable transfer.
-- brion vibber (brion @ wikimedia.org)
Brion Vibber wrote:
Tim Starling wrote:
It's probably better to use rsync, that's what we usually use for image backups. It's quite robust, and efficient if used properly. The only trouble with it is the server load -- it would quickly overload the backend if we made it available for public use. Maybe we could set up an rsyncd instance limited by client IP.
Note that rsync takes too long to build the file list for the full set of files; you have to break it up by smaller directories to get a reliable transfer.
I'm doing some testing now with rsync 3.0 (built from CVS) for updating our internal file upload backup. The new incremental recursion seems to handle the huge file set a lot better.
To give you an idea -- with rsync 2, I tried starting a sync job yesterday, and killed it this morning when I found it using 2.6 GIGABYTES OF MEMORY without having yet transferred ANY files!
3.0 starts transferring directories as they come along in smaller chunks instead of building a complete list of all (several million) files to transfer first.
Kinda nice. ;)
It might well be feasible to have a public rsync mirror if we can limit it to 3.0 clients.
-- brion vibber (brion @ wikimedia.org)
On 8/10/07, Minute Electron minuteelectron@googlemail.com wrote:
A tool has been created called Wikix, it downloads all images from a wiki. I
I'm not aware of any method to check the validity of non-deleted* files after downloading them via HTTP, beyond checking the size and hoping the file isn't corrupted or downloading them more than once.
When you're talking about very nearly 1TB of images fetched via 1.75 million HTTP requests (current commons size) corruption is a real issue if you care about getting a good copy. Errors that leave size intact are quite possible, and fetching every file twice isn't really a sane option for that much data.
I'm not aware of any efforts to download commons via HTTP. Previously Jeff Merkey downloaded those that English Wikipedia uses, but thats only part of a much larger collection.
I don't believe that moving that much data isn't really a major issue itself at least for the sort of people that have the storage around to handle it, back when I downloaded the old commons image dump (about 300gb) that we had posted the transfer took 4 days, which I don't consider a big deal at all.
*deleted files are renamed to the SHA1 of their content, so it's easy to check their transfer validity. I wish non-deleted images behaved in the same way.
On 8/10/07, Brianna Laugher brianna.laugher@gmail.com wrote:
- Backup.
No image dumps, still? This is quite worrying. To say the least.
It's not like the lack of public dumps means they haven't been backed up. I should hope all the clusters are periodically being backed up offsite (say, at one of the other clusters).
Brianna Laugher wrote:
- Improved search.
Dumping Mayflower in there would be a great start: http://tools.wikimedia.de/~tangotango/mayflower/ I suggest replacing the current search box with a Mayflower search box, and moving the current text search box below the toolbox.
Worth taking a look at, though I wonder why the people working on Mayflower aren't:
1) Active in the development list or IRC channel
2) Committing their code to SVN
Takes two to tango, guys. :)
-- brion vibber (brion @ wikimedia.org)
(Apologies for not getting to this sooner)
On 8/10/07, Brianna Laugher brianna.laugher@gmail.com wrote:
- Improved search.
Dumping Mayflower in there would be a great start: http://tools.wikimedia.de/~tangotango/mayflower/ I suggest replacing the current search box with a Mayflower search box, and moving the current text search box below the toolbox.
I will try to write a JavaScript-based search as a stop-gap measure once the API actually returns search results:-) I have some ideas for improving search results, and for mixed text/image search results.
- Rating system
I would like a simple youtube-style 5star rating thing for image pages, for logged-in users. More importantly it should be possible to sort search results (and maybe categories?) according to rating. This is important because, if you think about Wikipedia, any search query likely has only half a dozen highly relevant results at most. Most I would wager only have one. But on Commons, querying over individual images, a reasonable search query can and often does have dozens or even hundreds of results. The best way to sift through them would be by ratings. There is only so much manual rating we can do (and we are doing it). As Commons grows this is only going to get worse. This may not seem like a priority simply because we don't have the functionality now, but I have a strong feeling this is the right direction.
I could invest some time in this.
- Improved category handling
If the search was improved enough, it might make categories (and galleries) more or less irrelevant. I can only pray. However for the moment categories are often used for navigation. The bug of not putting all subcats on the first page is a frequent problem. The problem of not seeing the total number of items in the category is a problem. The lack of ability to sort a category by, e.g. date item was added to it, or date item was uploaded, is a problem. The lack of ability to 'auto-flatten' a category is a problem.
If by "auto-flatten" you mean display images in subcategories in the parent category, this might be doable with JavaScript. If you mean moving iamges to a parent category, there's my semi-automatic [[MediaWiki:Cat-a-lot.js]].
- Global native MW Checkusage, CommonsTicker functionality.
Checkusage is an utterly vital tool for us.
There is my JavaScript-based Check-usage: http://commons.wikimedia.org/w/index.php?title=Image:Valeriana_officinalis_2... which works through query.php (yay multi-site query capability!), but it can be quite slow.
- User upload gallery function
I think I wrote this some time ago (last year?). It was deactivated or reverted because there was (is?) no index on the upload user.
- Bulk upload function
This and image moving from another MediaWiki site could be eased by external tools quite easily, except that there are heavy restrictions on the file upload field. I once wrote an "upload from http site" function, but it is, of course, deactivated...
- Auto-resize galleries according to window size.
http://commons.wikimedia.org/wiki/MediaWiki_talk:ResizeGalleries.js
This will have to stay JavaScript, I guess; even if you can see the window size of the user when requesting the page, returning a different page for each request would mess up caching. Also, to resize the gallery when the window is resized after loading the page requires JavaScript anyway. I could try to have it execute sooner that it currently does, so you won't see the "jump" when resizing.
- Category and gallery 'previews'
Example: http://commons.wikimedia.org/wiki/Image:SWB9470_Puetzstrasse.jpg?withJS=Medi... http://commons.wikimedia.org/wiki/MediaWiki_talk:Gallerypreview.js This has improved usability so much even for me, an experienced user. It just makes you so much more likely to explore. And that is fantastic for how people should ideally use our site.
I rewrote my JavaScript as an extension some month ago, as "MiniPreview". As there is, as usual, no interest to get it working on the live site by the people who *could* make it live, I have not wasted^W spent as much time on it that on the JavaScript one.
- "Fotonotes".
http://commons.wikimedia.org/wiki/MediaWiki:ImageBoxes.js (a start, but I think not totally user friendly yet) Ability to put user annotations directly on an image. Another one of those things that once we start using I'm sure we won't be able to do without.
I have updated this a few days ago to actually show the box you draw while drawing. I can add "auto-edit" to this. However, parts of this will always stay JavaScript, and unless we get some new tagging mode, using edits/templates/categories for this seems as good a way as any.
Cheers, Magnus
I've added a user index to image/oldimage columns (which brion is running here, almost done). So it should be smooth again.
-Aaron Schulz
From: "Magnus Manske" magnusmanske@googlemail.com Reply-To: Wikimedia developers wikitech-l@lists.wikimedia.org To: "Wikimedia developers" wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Commons tech wishlist Date: Mon, 13 Aug 2007 10:55:36 +0100
- User upload gallery function
I think I wrote this some time ago (last year?). It was deactivated or reverted because there was (is?) no index on the upload user.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
_________________________________________________________________ Now you can see trouble before he arrives http://newlivehotmail.com/?ocid=TXT_TAGHM_migration_HM_viral_protection_0507
wikitech-l@lists.wikimedia.org