I did a little more hacking on the SVGEdit extension this weekend:
http://www.mediawiki.org/wiki/Extension:SVGEdit
The extension now uses SVG-edit's iframe embedding API, which lets us host the actual editor widget on a separate domain from MediaWiki. This also means that it's a short step to being able to slap together the smaller MediaWiki-side JS/CSS code as a gadget, which could be deployed by wiki admins without requiring system-level access to install the extension:
http://code.google.com/p/svg-edit/issues/detail?id=747
The primary holdup to being able to deploy it to Wikimedia sites is that scripts running in the MediaWiki context won't have direct access to the contents of files on upload.wikimedia.org. That means we can't load the current version of the file into the editor, which brings things to a nice halt. :(
My SVGEdit wrapper code is currently using the ApiSVGProxy extension to read SVG files via the local MediaWiki API. This seems to work fine locally, but it's not enabled on Wikimedia sites, and likely won't be generally around; it looks like Roan threw it together as a test, and I'm not sure if anybody's got plans on keeping it up or merging to core.
Since ApiSVGProxy serves SVG files directly out on the local domain as their regular content type, it potentially has some of the same safety concerns as img_auth.php and local hosting of upload files. If that's a concern preventing rollout, would alternatives such as wrapping the file data & metadata into a JSON structure be acceptable?
Alternately, we could look at using HTTP access control headers on upload.wikimedia.org, to allow XMLHTTPRequest in newer browsers to make unauthenticated requests to upload.wikimedia.org and return data directly:
https://developer.mozilla.org/En/HTTP_Access_Control
That would allow the front-end code to just pull the destination URLs from imageinfo and fetch the image data directly. It also has the advantage that it would work for non-SVG files; advanced HTML5 image editing tools using canvas could benefit from being able to load and save PNG and JPEG images as well.
https://bugzilla.wikimedia.org/show_bug.cgi?id=25886 requests this for bits.wikimedia.org (which carries the stylesheets and such).
In the meantime I'll probably work around it with an SVG-to-JSONP proxy on toolserver for the gadget, which should get things working while we sort it out.
-- brion vibber (brion @ pobox.com)
On Mon, Jan 3, 2011 at 3:22 PM, Brion Vibber brion@pobox.com wrote:
Since ApiSVGProxy serves SVG files directly out on the local domain as their regular content type, it potentially has some of the same safety concerns as img_auth.php and local hosting of upload files. If that's a concern preventing rollout, would alternatives such as wrapping the file data & metadata into a JSON structure be acceptable?
Would it be enough to serve it with Content-Disposition: attachment? I'd think that should block all direct use but still allow XHR to work (although I'm not totally sure).
On 1/3/11 12:22 PM, Brion Vibber wrote:
Alternately, we could look at using HTTP access control headers on upload.wikimedia.org, to allow XMLHTTPRequest in newer browsers to make unauthenticated requests to upload.wikimedia.org and return data directly:
https://developer.mozilla.org/En/HTTP_Access_Control
That would allow the front-end code to just pull the destination URLs from imageinfo and fetch the image data directly.
Yes. I have no trouble only enabling this for modern browsers, with just Apache config. SVG isn't even available on any version of IE in general use, including IE8.
This doesn't seem to be terribly hard to config in Apache. Looks like something Commons should be doing generally for its image servers.
Michael Dale is the expert on proxying though, and it has a more legit use case for simpler uploads and searching. Any thoughts, Michael?
On Mon, Jan 3, 2011 at 1:24 PM, Neil Kandalgaonkar neilk@wikimedia.orgwrote:
On 1/3/11 12:22 PM, Brion Vibber wrote:
Alternately, we could look at using HTTP access control headers on
upload.wikimedia.org, to allow XMLHTTPRequest in newer browsers to make unauthenticated requests to upload.wikimedia.org and return data directly:
https://developer.mozilla.org/En/HTTP_Access_Control
That would allow the front-end code to just pull the destination URLs from imageinfo and fetch the image data directly.
Yes. I have no trouble only enabling this for modern browsers, with just Apache config. SVG isn't even available on any version of IE in general use, including IE8.
Note that SVGEdit does work on IE9 preview, if using the latest editor code!
-- brion
2011/1/3 Brion Vibber brion@pobox.com:
My SVGEdit wrapper code is currently using the ApiSVGProxy extension to read SVG files via the local MediaWiki API. This seems to work fine locally, but it's not enabled on Wikimedia sites, and likely won't be generally around; it looks like Roan threw it together as a test, and I'm not sure if anybody's got plans on keeping it up or merging to core.
I threw it together real quick about a year ago, because of a request from Brad Neuberg from Google, who needed it so he could use SVGWeb (a Flash thingy that provides SVG support for IE versions that don't support SVG natively). Tim was supposed to review it but I don't remember whether he ever did. Also, Mark had some concerns (he looked into rewriting URLs in Squid first, but I think his conclusion was it was tricky and an API proxy would be easier), and there were concerns about caching, both from Mark who didn't seem to want these SVGs to end up in the text Squids, and from Aryeh who *did* want them to be cached. I told Aryeh I'd implement Squid support in ApiSVGProxy, but I don't think I ever did that.
For more background, see the conversation that took place in #mediawiki on Dec 29, 2009 starting around 00:30 UTC.
Since ApiSVGProxy serves SVG files directly out on the local domain as their regular content type, it potentially has some of the same safety concerns as img_auth.php and local hosting of upload files. If that's a concern preventing rollout, would alternatives such as wrapping the file data & metadata into a JSON structure be acceptable?
I think we should ask Mark and Tim to revisit this whole thing and have them work out what the best way would be to make SVGs available on the same domain. There's too many things I don't know, so I can't even guess what would be best.
Alternately, we could look at using HTTP access control headers on upload.wikimedia.org, to allow XMLHTTPRequest in newer browsers to make unauthenticated requests to upload.wikimedia.org and return data directly:
https://developer.mozilla.org/En/HTTP_Access_Control
That would allow the front-end code to just pull the destination URLs from imageinfo and fetch the image data directly. It also has the advantage that it would work for non-SVG files; advanced HTML5 image editing tools using canvas could benefit from being able to load and save PNG and JPEG images as well.
https://bugzilla.wikimedia.org/show_bug.cgi?id=25886 requests this for bits.wikimedia.org (which carries the stylesheets and such).
This should be enabled either way. You could then try the cross-domain request, and use the proxy if it fails.
But which browsers need the proxy anyway? Just IE8 and below? Do any of the proxy-needing browsers support CORS?
Roan Kattouw (Catrope)
On 01/03/2011 02:22 PM, Brion Vibber wrote:
Since ApiSVGProxy serves SVG files directly out on the local domain as their regular content type, it potentially has some of the same safety concerns as img_auth.php and local hosting of upload files. If that's a concern preventing rollout, would alternatives such as wrapping the file data & metadata into a JSON structure be acceptable?
hmm... Is img_auth widely used? Can we just disable svg api data access if $wgUploadPath includes imageAuth ... or add a configuration variable that states if img_auth is an active entry point? Why dont we think about the problem diffrently and support serving images through the api instead of maintaining a speperate img_auth entry point?
Is the idea that our asset scrubbing for malicious scripts or embed image html tags to protect against IE's lovely 'auto mime' content type is buggy? I think the majority of mediaWiki installations are serving assets on the same domain as the content. So we would do good to address that security concern as our own. ( afaiak we already address this pretty well) Furthermore we don't want people to have to re-scrub once they do access that svg data on the local domain...
It would be nice to serve up diffrent content types "data" over the api in a number of use cases. For example we could have a more structured thumb.php entry point or serve up video thumbnails at requested times and resolutions. This could also clean up Neil's upload wizard per-user temporary image store by requesting these assets through the api instead of relying on obfuscation of the url. Likewise the add media wizard presently does two requests once it opens the larger version of the image.
Eventually it would be nice to make more services available like svg localisation / variable substitution and rasterization. ( ie give me engine_figure2.svg in Spanish at 600px wide as a png )
It may hurt caching to serve everything over jsonp since we can't set smaxage with callback=randomString urls. If its just for editing its not a big deal, untill some IE svg viewer hack starts getting all svg over jsonp ;) ... Would be best if we could access this data without varying urls.
Alternately, we could look at using HTTP access control headers on upload.wikimedia.org, to allow XMLHTTPRequest in newer browsers to make unauthenticated requests to upload.wikimedia.org and return data directly:
I vote yes! ... This would also untaint video canvas data that I am making more and more use of in the sequencer ... Likewise we could add a crossdomain.xml file so IE flash svg viewers can access the data.
In the meantime I'll probably work around it with an SVG-to-JSONP proxy on toolserver for the gadget, which should get things working while we sort it out.
Sounds reasonable :)
We should be able to "upload" the result via the api on the same domain as the editor so would be very fun to enable this for quick svg edits :)
peace, --michael
2011/1/4 Michael Dale mdale@wikimedia.org:
hmm... Is img_auth widely used? Can we just disable svg api data access if $wgUploadPath includes imageAuth ... or add a configuration variable that states if img_auth is an active entry point? Why dont we think about the problem diffrently and support serving images through the api instead of maintaining a speperate img_auth entry point?
The separate img_auth.php entry point is needed on wikis where reading is restricted (private wiis), and img_auth.php will check for read permissions before it outputs the file. The difference between the proxy I wrote and img_auth.php is that img_auth.php just streams the file from the filesystem (which, on WMF, will hit NFS every time, which is bad) whereas ApiSVGProxy uses an HTTP request (which will hit the image Squids, which is good).
Is the idea that our asset scrubbing for malicious scripts or embed image html tags to protect against IE's lovely 'auto mime' content type is buggy?
No, IEContentAnalyzer will reject anything that would "confuse" IE.
I think the majority of mediaWiki installations are serving assets on the same domain as the content. So we would do good to address that security concern as our own. ( afaiak we already address this pretty well) Furthermore we don't want people to have to re-scrub once they do access that svg data on the local domain...
MW was written with this same-domain setup in mind, and WMF is one of the very few setups out there that uses a separate domain for files. So I'm fairly sure we don't rely on files being on a different or cookieless domain for security.
It would be nice to serve up diffrent content types "data" over the api in a number of use cases. For example we could have a more structured thumb.php entry point or serve up video thumbnails at requested times and resolutions. This could also clean up Neil's upload wizard per-user temporary image store by requesting these assets through the api instead of relying on obfuscation of the url. Likewise the add media wizard presently does two requests once it opens the larger version of the image.
Eventually it would be nice to make more services available like svg localisation / variable substitution and rasterization. ( ie give me engine_figure2.svg in Spanish at 600px wide as a png )
You should talk to Russ Nelson, Ariel Glenn and the other people currently involved in redesigning WMF's file storage architecture.
It may hurt caching to serve everything over jsonp since we can't set smaxage with callback=randomString urls. If its just for editing its not a big deal, untill some IE svg viewer hack starts getting all svg over jsonp ;) ... Would be best if we could access this data without varying urls.
Yes, JSONP is bad for caching.
Roan Kattouw (Catrope)
On 01/04/2011 09:57 AM, Roan Kattouw wrote:
The separate img_auth.php entry point is needed on wikis where reading is restricted (private wiis), and img_auth.php will check for read permissions before it outputs the file. The difference between the proxy I wrote and img_auth.php is that img_auth.php just streams the file from the filesystem (which, on WMF, will hit NFS every time, which is bad) whereas ApiSVGProxy uses an HTTP request (which will hit the image Squids, which is good).
So ... it would be good to think about moving things like img_auth.php and thumb.php over to an general purpose api media serving module no?
This would help standardise how media serving is "extended", reduce extra entry points and as you point out above let us use more uniformly proxy our back-end data access over HTTP to hit the squids instead of NFS where possible.
And as a shout out to Trevors mediawiki 2.0 vission, eventually enable more REST like interfaces within mediaWiki media handing.
--michael
On Tue, Jan 4, 2011 at 5:37 AM, Roan Kattouw roan.kattouw@gmail.com wrote:
Alternately, we could look at using HTTP access control headers on upload.wikimedia.org, to allow XMLHTTPRequest in newer browsers to make unauthenticated requests to upload.wikimedia.org and return data
directly:
This should be enabled either way. You could then try the cross-domain request, and use the proxy if it fails.
Sensible, yes.
But which browsers need the proxy anyway? Just IE8 and below? Do any of the proxy-needing browsers support CORS?
I think for straight viewing only the Flash compat widget needs cross-domain permissions (browsers with native support use <object> for viewing), and a Flash cross-domain settings file would probably take care of that.
For editing, or other tools that need to directly access the file data, either a proxy or CORS should do the job. I _think_ current versions of all major browsers support CORS for XHR fetches, but I haven't done compat tests yet. (IE8 requires using an alternate XDR class instead of XHR but since it doesn't do native SVG I don't care too much; I haven't checked IE9 yet, but since the editor works in it I want to make sure we can load the files!)
-- brion
On 01/04/2011 05:57 PM, Roan Kattouw wrote:
2011/1/4 Michael Dalemdale@wikimedia.org:
It may hurt caching to serve everything over jsonp since we can't set smaxage with callback=randomString urls. If its just for editing its not a big deal, untill some IE svg viewer hack starts getting all svg over jsonp ;) ... Would be best if we could access this data without varying urls.
Yes, JSONP is bad for caching.
Well, if the response is informative enough, you can often use constant callback names. A lot of my scripts which use the API do that. Of course, it may mean a bit more work if you're using a framework like jQuery which defaults to random callback names, but it's not that much.
A couple of examples off the top of my head: http://commons.wikimedia.org/wiki/MediaWiki:MainPages.js http://commons.wikimedia.org/wiki/MediaWiki:Gadget-PrettyLog.js
On 1/4/11 9:24 AM, Michael Dale wrote:
So ... it would be good to think about moving things like img_auth.php and thumb.php over to an general purpose api media serving module no?
It's related, but we're just laying the foundations now. I think we haven't really talked about this on wikitech, this might be a good time to mention it...
We're just evaluating systems to store things at scale. Or rather Russ Nelson (__nelson on IRC) is primarily doing that -- he is a contractor whom some of you met at the DC meetup. The rest of us (me, Ariel Glenn, Mark Bergsma, and the new ops manager CT Woo) are helping now and then or trying to evolve the requirements as new info comes in.
Most of the info is here:
http://wikitech.wikimedia.org/view/Media_server/Distributed_File_Storage_cho...
We've narrowed it down to two systems that are being tested right now, MogileFS and OpenStack. OpenStack has more built-in stuff to support authentication. MogileFS is used in many systems that have an authentication layer, but it seems you have to build more of it from scratch.
Authentication is really a nice-to-have for Commons or Wikipedia right now. I anticipate it being useful for a handful of cases, which are both more anticipated than actual right now: - images uploaded but not published (a la UploadWizard) - forum avatars (which can viewed by anyone, but can only be edited by the user they belong to)
I think thumbnail and transformation servers (they should also do stuff like rotating things on demand) are separate from how we store things, and will just be acting on behalf of the user anyway. So they don't introduce new requirements to image storage. Anybody see anything problematic about that?
As for things like SVG translation, I'm going to say that's out of scope and probably impractical. Our experience with the Upload Wizard Licensing Tutorial shows that it's pretty rare to be able to simply plug in new strings into an SVG and have an acceptable translation. It usually needs some layout adjustment, and for RTL languages it needs pretty radical changes.
That said, it's an interesting frontier and it would be awesome to have a tool which made it easier to create translated SVGs or indicate that translations were related to each other. One thing at a time though.
On 01/04/2011 01:12 PM, Neil Kandalgaonkar wrote:
We've narrowed it down to two systems that are being tested right now, MogileFS and OpenStack. OpenStack has more built-in stuff to support authentication. MogileFS is used in many systems that have an authentication layer, but it seems you have to build more of it from scratch.
Authentication is really a nice-to-have for Commons or Wikipedia right now. I anticipate it being useful for a handful of cases, which are both more anticipated than actual right now:
- images uploaded but not published (a la UploadWizard)
- forum avatars (which can viewed by anyone, but can only be edited
by the user they belong to)
hmm. I think it would ( obviously? ) be best to handle media authentication at the mediaWiki level with just a simple private / public accessible classification for the backed storage system. Things that are "private" have to go through the mediaWiki api where you can leverage all the existing extendible credential management.
Also important to keep things simple for 3rd parties that are not using a clustered filesystem stack, easier to map web accessible dir vs not .. than any authentication managed within the storage system.
Image 'editing' / uploading already includes basic authentication ie: http://www.mediawiki.org/wiki/Manual:Configuring_file_uploads#Upload_permiss... User avatars would be a special case of
I think thumbnail and transformation servers (they should also do stuff like rotating things on demand) are separate from how we store things, and will just be acting on behalf of the user anyway. So they don't introduce new requirements to image storage. Anybody see anything problematic about that?
I think managing storage of procedural derivative assets differently than original files is pretty important. Probably one of the core features of a Wikimedia Storage system.
Assuming finite storage it would be nice to specify we don't care as much if we lose thumbnails vs losing original assets. For example when doing 3rd party backups or "dumps"we don't need all the derivatives to be included.
We don't' need need to keep random resolutions derivatives of old revisions of assets around for ever, likewise improvements to SVG rasterization or improvements to transcoding software would mean "expiring" derivatives
When mediaWiki is dealing with file maintenance it should have to authenticate differently when removing, moving, or overwriting orginals vs derivatives i.e independent of DB revision numbers or what mediaWiki *thinks* it should be doing.
For example only upload ingestion nodes or "modes" should have write access to the archive store. Transcoding or thumbnailing or maintenance nodes or "modes" should only have read-only access to archive originals and write access to derivatives.
As for things like SVG translation, I'm going to say that's out of scope and probably impractical. Our experience with the Upload Wizard Licensing Tutorial shows that it's pretty rare to be able to simply plug in new strings into an SVG and have an acceptable translation. It usually needs some layout adjustment, and for RTL languages it needs pretty radical changes.
That said, it's an interesting frontier and it would be awesome to have a tool which made it easier to create translated SVGs or indicate that translations were related to each other. One thing at a time though.
I don't think its that impractical ;) SVG includes some conventions for layout. With some procedural sugar could be improved, ie container sizes dictating relative character size. It may not be perfectly beautiful but certainly everyone translating content should not have to know how to edit SVG files, likewise software can facilitate a separate svg layout expert to come in later and improve on the automated derivative.
But your correct its not part really part of storage considerations. But is part of thinking about the future of access to media streams via the api.
Maybe the base thing for the storage platform to consider in this thread is: access to media streams via the api or if its going to try and manage a separate entry point outside of mediawiki. I think public assets going over the existing squid -> http file server path and non-public asset going trough an api entry point would make sense.
--michael
Neil Kandalgaonkar wrote:
We've narrowed it down to two systems that are being tested right now, MogileFS and OpenStack. OpenStack has more built-in stuff to support authentication. MogileFS is used in many systems that have an authentication layer, but it seems you have to build more of it from scratch.
Authentication is really a nice-to-have for Commons or Wikipedia right now. I anticipate it being useful for a handful of cases, which are both more anticipated than actual right now:
- images uploaded but not published (a la UploadWizard)
- forum avatars (which can viewed by anyone, but can only be edited
by the user they belong to)
I don't see how FS authentication is useful there. All authentication would be performed by mediawiki, with a master credential such as $wgDBpassword. MediaWiki shouldn't need to send the media server a user password! (NB sysops should be able to remove goatses from forum avatars...) Authentication as understood by OpenStack is of little use for us now. Things like adding a uid column in mysql would be more useful than a native authentication for accessing the resource.
As for things like SVG translation, I'm going to say that's out of scope and probably impractical. Our experience with the Upload Wizard Licensing Tutorial shows that it's pretty rare to be able to simply plug in new strings into an SVG and have an acceptable translation. It usually needs some layout adjustment, and for RTL languages it needs pretty radical changes.
You can provide the same SVG changing just the legend box.
Michael Dale wrote:
I think thumbnail and transformation servers (they should also do stuff like rotating things on demand) are separate from how we store things, and will just be acting on behalf of the user anyway. So they don't introduce new requirements to image storage. Anybody see anything problematic about that?
I think managing storage of procedural derivative assets differently than original files is pretty important. Probably one of the core features of a Wikimedia Storage system.
Yes, I think we should treat them as different "image clusters", optionally sharing servers (unless there's a better equivalent available in the dfs).
Assuming finite storage it would be nice to specify we don't care as much if we lose thumbnails vs losing original assets. For example when doing 3rd party backups or "dumps"we don't need all the derivatives to be included.
We don't' need need to keep random resolutions derivatives of old revisions of assets around for ever, likewise improvements to SVG rasterization or improvements to transcoding software would mean "expiring" derivatives
A good point.
On 1/4/11 12:51 PM, Platonides wrote:
I don't see how FS authentication is useful there. All authentication would be performed by mediawiki, with a master credential such as $wgDBpassword. MediaWiki shouldn't need to send the media server a user password!
Nobody said we'd be sending user passwords over to a media server. Most of the time, even regular MediaWiki servers don't need to see passwords. They just need some means to authenticate the session cookie.
But like I said we don't have very firm plans about how we would do authentication.
(NB sysops should be able to remove goatses from forum avatars...)
Yeah. Avatars can be tricky. Also to be pedantically correct you want to have some guard against impersonation (using same icon, and maybe adding unicode space characters or other trivial changes to username).
On 1/4/11 12:21 PM Neil Kandalgaonkar wrote:
On 1/4/11 12:51 PM, Platonides wrote:
I don't see how FS authentication is useful there. All authentication would be performed by mediawiki, with a master credential such as $wgDBpassword. MediaWiki shouldn't need to send the media server a user password!
Nobody said we'd be sending user passwords over to a media server. Most of the time, even regular MediaWiki servers don't need to see passwords. They just need some means to authenticate the session cookie.
But like I said we don't have very firm plans about how we would do authentication.
This was just a counter-point to the statement "authentication is really a nice-to-have for Commons or Wikipedia right now".
(NB sysops should be able to remove goatses from forum avatars...)
Yeah. Avatars can be tricky. Also to be pedantically correct you want to have some guard against impersonation (using same icon, and maybe adding unicode space characters or other trivial changes to username).
The last one _should_ already be handled by AntiSpoof. (Although there is, for instance, an open bug about ZWJ, any takers?)
On 05/01/11 00:37, Roan Kattouw wrote:
2011/1/3 Brion Vibber brion@pobox.com:
My SVGEdit wrapper code is currently using the ApiSVGProxy extension to read SVG files via the local MediaWiki API. This seems to work fine locally, but it's not enabled on Wikimedia sites, and likely won't be generally around; it looks like Roan threw it together as a test, and I'm not sure if anybody's got plans on keeping it up or merging to core.
I threw it together real quick about a year ago, because of a request from Brad Neuberg from Google, who needed it so he could use SVGWeb (a Flash thingy that provides SVG support for IE versions that don't support SVG natively). Tim was supposed to review it but I don't remember whether he ever did.
I reviewed the JavaScript side, and asked for two changes:
* Make it possible to disable client-side scripting in configuration * Fix the interface between JS and Flash, which was using __SVG__DELIMIT as a delimiter without checking for that string in the input. User input containing this string could thus pass arbitrary parameters to flash, with possible security consequences.
Three weeks after my review, Brad opened a ticket:
http://code.google.com/p/svgweb/issues/detail?id=446
I haven't heard anything back from them since, and I see the ticket is still open. I haven't reviewed the Flash side.
-- Tim Starling
wikitech-l@lists.wikimedia.org