On 01/04/2011 01:12 PM, Neil Kandalgaonkar wrote:
We've narrowed it down to two systems that are being tested right now, MogileFS and OpenStack. OpenStack has more built-in stuff to support authentication. MogileFS is used in many systems that have an authentication layer, but it seems you have to build more of it from scratch.
Authentication is really a nice-to-have for Commons or Wikipedia right now. I anticipate it being useful for a handful of cases, which are both more anticipated than actual right now:
- images uploaded but not published (a la UploadWizard)
- forum avatars (which can viewed by anyone, but can only be edited
by the user they belong to)
hmm. I think it would ( obviously? ) be best to handle media authentication at the mediaWiki level with just a simple private / public accessible classification for the backed storage system. Things that are "private" have to go through the mediaWiki api where you can leverage all the existing extendible credential management.
Also important to keep things simple for 3rd parties that are not using a clustered filesystem stack, easier to map web accessible dir vs not .. than any authentication managed within the storage system.
Image 'editing' / uploading already includes basic authentication ie: http://www.mediawiki.org/wiki/Manual:Configuring_file_uploads#Upload_permiss... User avatars would be a special case of
I think thumbnail and transformation servers (they should also do stuff like rotating things on demand) are separate from how we store things, and will just be acting on behalf of the user anyway. So they don't introduce new requirements to image storage. Anybody see anything problematic about that?
I think managing storage of procedural derivative assets differently than original files is pretty important. Probably one of the core features of a Wikimedia Storage system.
Assuming finite storage it would be nice to specify we don't care as much if we lose thumbnails vs losing original assets. For example when doing 3rd party backups or "dumps"we don't need all the derivatives to be included.
We don't' need need to keep random resolutions derivatives of old revisions of assets around for ever, likewise improvements to SVG rasterization or improvements to transcoding software would mean "expiring" derivatives
When mediaWiki is dealing with file maintenance it should have to authenticate differently when removing, moving, or overwriting orginals vs derivatives i.e independent of DB revision numbers or what mediaWiki *thinks* it should be doing.
For example only upload ingestion nodes or "modes" should have write access to the archive store. Transcoding or thumbnailing or maintenance nodes or "modes" should only have read-only access to archive originals and write access to derivatives.
As for things like SVG translation, I'm going to say that's out of scope and probably impractical. Our experience with the Upload Wizard Licensing Tutorial shows that it's pretty rare to be able to simply plug in new strings into an SVG and have an acceptable translation. It usually needs some layout adjustment, and for RTL languages it needs pretty radical changes.
That said, it's an interesting frontier and it would be awesome to have a tool which made it easier to create translated SVGs or indicate that translations were related to each other. One thing at a time though.
I don't think its that impractical ;) SVG includes some conventions for layout. With some procedural sugar could be improved, ie container sizes dictating relative character size. It may not be perfectly beautiful but certainly everyone translating content should not have to know how to edit SVG files, likewise software can facilitate a separate svg layout expert to come in later and improve on the automated derivative.
But your correct its not part really part of storage considerations. But is part of thinking about the future of access to media streams via the api.
Maybe the base thing for the storage platform to consider in this thread is: access to media streams via the api or if its going to try and manage a separate entry point outside of mediawiki. I think public assets going over the existing squid -> http file server path and non-public asset going trough an api entry point would make sense.
--michael