On 01/04/2011 01:12 PM, Neil Kandalgaonkar wrote:
We've narrowed it down to two systems that are
being tested right now,
MogileFS and OpenStack. OpenStack has more built-in stuff to support
authentication. MogileFS is used in many systems that have an
authentication layer, but it seems you have to build more of it from
Authentication is really a nice-to-have for Commons or Wikipedia right
now. I anticipate it being useful for a handful of cases, which are
both more anticipated than actual right now:
- images uploaded but not published (a la UploadWizard)
- forum avatars (which can viewed by anyone, but can only be edited
by the user they belong to)
hmm. I think it would ( obviously? ) be best to handle media
authentication at the mediaWiki level with just a simple private /
public accessible classification for the backed storage system. Things
that are "private" have to go through the mediaWiki api where you can
leverage all the existing extendible credential management.
Also important to keep things simple for 3rd parties that are not using
a clustered filesystem stack, easier to map web accessible dir vs not ..
than any authentication managed within the storage system.
Image 'editing' / uploading already includes basic authentication ie:
User avatars would be a special case of
I think thumbnail and transformation servers (they should also do
stuff like rotating things on demand) are separate from how we store
things, and will just be acting on behalf of the user anyway. So they
don't introduce new requirements to image storage. Anybody see
anything problematic about that?
I think managing storage of procedural derivative assets differently
than original files is pretty important. Probably one of the core
features of a Wikimedia Storage system.
Assuming finite storage it would be nice to specify we don't care as
much if we lose thumbnails vs losing original assets. For example when
doing 3rd party backups or "dumps"we don't need all the derivatives to
We don't' need need to keep random resolutions derivatives of old
revisions of assets around for ever, likewise improvements to SVG
rasterization or improvements to transcoding software would mean
When mediaWiki is dealing with file maintenance it should have to
authenticate differently when removing, moving, or overwriting orginals
vs derivatives i.e independent of DB revision numbers or what mediaWiki
*thinks* it should be doing.
For example only upload ingestion nodes or "modes" should have write
access to the archive store. Transcoding or thumbnailing or maintenance
nodes or "modes" should only have read-only access to archive originals
and write access to derivatives.
As for things like SVG translation, I'm going to say that's out of
scope and probably impractical. Our experience with the Upload Wizard
Licensing Tutorial shows that it's pretty rare to be able to simply
plug in new strings into an SVG and have an acceptable translation. It
usually needs some layout adjustment, and for RTL languages it needs
pretty radical changes.
That said, it's an interesting frontier and it would be awesome to
have a tool which made it easier to create translated SVGs or indicate
that translations were related to each other. One thing at a time though.
I don't think its that impractical ;) SVG includes some conventions for
layout. With some procedural sugar could be improved, ie container sizes
dictating relative character size. It may not be perfectly beautiful but
certainly everyone translating content should not have to know how to
edit SVG files, likewise software can facilitate a separate svg layout
expert to come in later and improve on the automated derivative.
But your correct its not part really part of storage considerations. But
is part of thinking about the future of access to media streams via the
Maybe the base thing for the storage platform to consider in this thread
is: access to media streams via the api or if its going to try and
manage a separate entry point outside of mediawiki. I think public
assets going over the existing squid -> http file server path and
non-public asset going trough an api entry point would make sense.