Hello dear developer community,
today I would like to awake your attention and interest on a very old bug with the number 189: https://bugzilla.wikimedia.org/show_bug.cgi?id=189, and related to it Bug 29630: https://bugzilla.wikimedia.org/show_bug.cgi?id=29630
It is about the possibility of different MediaWiki plug-ins that would enable our projects, mostly WikiSource, but also Wikipedia, to input and show music notes on an easy way. You can really have impact on our projects by addressing this bug and make a lot of Wikisource community members happy.
Any takers on this?
Greetings Ting
If you know some combination musician/developers, please feel free to send them my way! :-) And I know Mark Hershberger recently committed the LilyPond extension for MediaWiki[0] into Subversion[1], and it is getting reviewed there. That's a good way current MediaWiki developers can contribute to improving the state of music scholarship on Wikimedia projects.
Take a look at the music markup discussion on meta.[2] Basically, the Wikimedia community needs someone to write a MediaWiki extension that will let users insert musical notation and have it show up as sheet music within a Wikipedia page, or improve the existing LilyPond extension to a deployable state.
For those who don't know, Ting Chen is the chair of the Wikimedia Board of Trustees.[3] Thanks for the suggestion, Ting.
[0] http://www.mediawiki.org/wiki/Extension:LilyPond [1] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/98424 [2] https://meta.wikimedia.org/wiki/Music_markup [3] https://wikimediafoundation.org/wiki/Board_of_Trustees
Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation
On Sun, Oct 16, 2011 at 3:00 AM, Ting Chen wing.philopp@gmx.de wrote:
Hello dear developer community,
today I would like to awake your attention and interest on a very old bug with the number 189: https://bugzilla.wikimedia.org/show_bug.cgi?id=189, and related to it Bug 29630: https://bugzilla.wikimedia.org/show_bug.cgi?id=29630
It is about the possibility of different MediaWiki plug-ins that would enable our projects, mostly WikiSource, but also Wikipedia, to input and show music notes on an easy way. You can really have impact on our projects by addressing this bug and make a lot of Wikisource community members happy.
Any takers on this?
Greetings Ting
On Oct 18, 2011 8:51 PM, "Sumana Harihareswara" sumanah@wikimedia.org wrote: […]
For those who don't know, Ting Chen is the chair of the Wikimedia Board of Trustees.[3] Thanks for the suggestion, Ting.
FWIW, when first saw Ting's mail I assumed that using an address I'd never seen before instead of his foundation address was intentional and an indication that he was not acting in his official capacity.
But maybe I was reading to much into it; he can certainly clarify explicitly or make any other comments if he likes.
-Jeremy
On 19.10.2011 03:06, wrote Jeremy Baron:
FWIW, when first saw Ting's mail I assumed that using an address I'd never seen before instead of his foundation address was intentional and an indication that he was not acting in his official capacity. But maybe I was reading to much into it; he can certainly clarify explicitly or make any other comments if he likes.
That's right. Though I still think that this is a good thing to do :-)
Greetings Ting
On Sun, 2011-10-16 at 12:00 +0200, Ting Chen wrote:
today I would like to awake your attention and interest on a very old bug with the number 189: https://bugzilla.wikimedia.org/show_bug.cgi?id=189, and related to it Bug 29630: https://bugzilla.wikimedia.org/show_bug.cgi?id=29630
It is about the possibility of different MediaWiki plug-ins that would enable our projects, mostly WikiSource, but also Wikipedia, to input and show music notes on an easy way. You can really have impact on our projects by addressing this bug and make a lot of Wikisource community members happy.
Any takers on this?
It is my understanding that there are three road blockers on this one:
1) ABC vs Lilypond, and which exact implementation to use. At this point I assume everyone is so sick of waiting that no one will care what is used so long as something is used.
2) IIRC, Brion wanted this to be made around an universal system for handling automatically-generated images, that would also be useful for math and future similar extensions. But since this is such an old request, and such a system is not in sight, perhaps he could look the other way just one more time :)
3) And the big one, security. It has not been shown that any of the proposed implementations is secure. I was thinking that perhaps a way to overcome this would be to have a dedicated system just for handling music rendering. It would work something like this:
a) A dedicated server used only for music rendering. The server runs several virtual machines with the music rendering software. It only accepts the notes and returns the images.
b) When the parent server receives a text with the notes, it only passes it to a free virtual machine. When it receives the images from the virtual machine, it passes them back to the client.
c) If it doesn't receive the images within a certain time, it shuts down the virtual machine, starts a new one and returns an error image to the client.
Is there a hole in this system that would make it possible to hack the parent server by means of a malicious file?
Using Virtual Machines is a too big overhead compared to just coding it right, and still it would not protect against eg. javascript injection.
Looking into LilyPond exception, I don't see any big problem: - It relies in Math variables for storing the files in the same folder (it was made before Math extension was split). - $wgMathPath isn't properly escaped, but that's minor. - Usage of hardcoded text, math_failure, <b>, etc. in error messages. - It uses escapeshellarg instead of wfEscapeShellArg but the filename is safe anyway (and our servers aren't windows). - Maybe of greater concern is that it assumes to own everything in $wgTmpDirectory when those files could have been created: a) By another extension b) By another instance of LilyPond
I don't know why it needs to trim the images generated by LilyPond, but there's probably a reason for that. Assuming that LilyPond code doesn't allow to open files, or execute programs, the current version of LilyPond is apparently safe.
Although I have to admit that it is not pretty, and its "store files without tracking" is something that we shouldn't repeat with new extensions.
On Sun, Oct 23, 2011 at 6:55 PM, Platonides Platonides@gmail.com wrote:
I don't know why it needs to trim the images generated by LilyPond, but there's probably a reason for that. Assuming that LilyPond code doesn't allow to open files, or execute programs, the current version of LilyPond is apparently safe.
From my memory of reading bug 189 a few years ago, the biggest concern
Tim had at the time was that Lilypond does not (or, at least, at the time did not; I haven't kept up with LilyPond development news at all) limit memory or time usage. For certain pathological (or well-crafted, if you're an attacker) inputs, the process of converting LilyPond syntax to an image with sheet music may consume a very large amount of time and/or memory. From my casual re-reading it seems that it's quite easy to write an infinite loop in LilyPond. LilyPond does have a safe mode, but it does not protect against infinite loops, nor does it claim to. Clearly, this presents a resource exhaustion / denial of service vulnerability given that certain inputs can cause the LilyPond interpreter to run forever, and any user that can edit or even preview pages can inject such inputs.
So in order to run LilyPond on WMF servers and be able to feed it arbitrary user input while protecting against resource exhaustion, we need to be able to limit the amount of time and memory that each LilyPond process can use, either in LilyPond itself or in the MediaWiki extension that spawns LilyPond processes. To my knowledge, no one has volunteered for this task so far.
Roan
On 23/10/11 19:36, Roan Kattouw wrote:
On Sun, Oct 23, 2011 at 6:55 PM, PlatonidesPlatonides@gmail.com wrote:
I don't know why it needs to trim the images generated by LilyPond, but there's probably a reason for that. Assuming that LilyPond code doesn't allow to open files, or execute programs, the current version of LilyPond is apparently safe.
From my memory of reading bug 189 a few years ago, the biggest concern
Tim had at the time was that Lilypond does not (or, at least, at the time did not; I haven't kept up with LilyPond development news at all) limit memory or time usage. For certain pathological (or well-crafted, if you're an attacker) inputs, the process of converting LilyPond syntax to an image with sheet music may consume a very large amount of time and/or memory. From my casual re-reading it seems that it's quite easy to write an infinite loop in LilyPond. LilyPond does have a safe mode, but it does not protect against infinite loops, nor does it claim to. Clearly, this presents a resource exhaustion / denial of service vulnerability given that certain inputs can cause the LilyPond interpreter to run forever, and any user that can edit or even preview pages can inject such inputs.
So in order to run LilyPond on WMF servers and be able to feed it arbitrary user input while protecting against resource exhaustion, we need to be able to limit the amount of time and memory that each LilyPond process can use, either in LilyPond itself or in the MediaWiki extension that spawns LilyPond processes. To my knowledge, no one has volunteered for this task so far.
Roan
Linux provides the setrlimit() system call for this purpose -- you could either call it as a wrapper around lilypond, or hack it into a de-fanged version of Lilypond.
If you're going to be running an auxiliary rendering process or special-use server anyway, a few moments Googling finds the "softlimit" program, provided as part of the daemontools package, which looks like it is intended for providing the sort of limited sandboxing required here.
- Neil
On Sun, Oct 23, 2011 at 9:14 PM, Neil Harris neil@tonal.clara.co.uk wrote:
Linux provides the setrlimit() system call for this purpose -- you could either call it as a wrapper around lilypond, or hack it into a de-fanged version of Lilypond.
If you're going to be running an auxiliary rendering process or special-use server anyway, a few moments Googling finds the "softlimit" program, provided as part of the daemontools package, which looks like it is intended for providing the sort of limited sandboxing required here.
Yes, that sort of thing is what I meant by limiting CPU/mem usage on the MW extension side. However, someone would have to actually do that, as well as cleaning up the other coding style issues that I seem to recall were floating around Extension:LilyPond in its current state. So far, I don't believe anyone has actually shown themselves willing to do that work.
Roan
Neil Harris wrote:
Linux provides the setrlimit() system call for this purpose -- you could either call it as a wrapper around lilypond, or hack it into a de-fanged version of Lilypond.
If you're going to be running an auxiliary rendering process or special-use server anyway, a few moments Googling finds the "softlimit" program, provided as part of the daemontools package, which looks like it is intended for providing the sort of limited sandboxing required here.
- Neil
We already have several ulimit.sh inside phase3/bin for that. If LilyPond extension were using wfShellExec instead of exec, it would be automatically limited by $wgMaxShellTime, $wgMaxShellMemory and $wgMaxShellFileSize (unless on Windows).
On Sun, 23 Oct 2011 14:42:24 -0700, Platonides Platonides@gmail.com wrote:
Neil Harris wrote:
Linux provides the setrlimit() system call for this purpose -- you could either call it as a wrapper around lilypond, or hack it into a de-fanged version of Lilypond.
If you're going to be running an auxiliary rendering process or special-use server anyway, a few moments Googling finds the "softlimit" program, provided as part of the daemontools package, which looks like it is intended for providing the sort of limited sandboxing required here.
- Neil
We already have several ulimit.sh inside phase3/bin for that. If LilyPond extension were using wfShellExec instead of exec, it would be automatically limited by $wgMaxShellTime, $wgMaxShellMemory and $wgMaxShellFileSize (unless on Windows).
Does that even work? I've regularly had issues with convert commands that can't get enough mem to finish and are never killed so they sit in the process list hoarding it.
Nikola Smolenski smolensk@eunet.rs writes:
- And the big one, security. It has not been shown that any of the
proposed implementations is secure. I was thinking that perhaps a way to overcome this would be to have a dedicated system just for handling music rendering.
We don't want to "overcome" any security problems. This open source software: If we know of a security problem, we want to eliminate it.
But yes, your idea of "only accepting notes" is a good one. From what I've seen, the Lilypond extension seems to accept arbitrary LaTeX, but I haven't looked too closely.
The main problem I see is that developer interest in bug 189 needs to be bootstraped. Bug 189 has 115 comments, over 50 of them before 2009. But other than a burst of activity in 2007 for Lilypond and second burst in 2008 for ABC, actual development effort to get music on WMF projects has laid largely dormant.
So, how can we change this?
I think one way would be to provide the Wikisource community with wiki on which to try out the Music module and give us feedback about how they work while allowing us to develop them and fix the security problems.
I'm planning on setting up a MW instance with Lilypond and/or ABC using Wikipedia Labs. I think we could use the FileRepo to point to source pages like http://de.wikisource.org/wiki/Datei:De_Schauenburg_Allgemeines_Deutsches_Kom... and editors could start providing a transcription of the pages.
What do you think?
Mark.
Mark A. Hershberger wrote:
Nikola Smolenski smolensk@eunet.rs writes:
- And the big one, security. It has not been shown that any of the
proposed implementations is secure. I was thinking that perhaps a way to overcome this would be to have a dedicated system just for handling music rendering.
We don't want to "overcome" any security problems. This open source software: If we know of a security problem, we want to eliminate it.
But yes, your idea of "only accepting notes" is a good one. From what I've seen, the Lilypond extension seems to accept arbitrary LaTeX, but I haven't looked too closely.
The main problem I see is that developer interest in bug 189 needs to be bootstraped. Bug 189 has 115 comments, over 50 of them before 2009. But other than a burst of activity in 2007 for Lilypond and second burst in 2008 for ABC, actual development effort to get music on WMF projects has laid largely dormant.
So, how can we change this?
I think one way would be to provide the Wikisource community with wiki on which to try out the Music module and give us feedback about how they work while allowing us to develop them and fix the security problems.
I'm planning on setting up a MW instance with Lilypond and/or ABC using Wikipedia Labs. I think we could use the FileRepo to point to source pages like http://de.wikisource.org/wiki/Datei:De_Schauenburg_Allgemeines_Deutsches_Kom... rsbuch_138.jpg and editors could start providing a transcription of the pages.
What do you think?
Am I missing something? The extension has serious vulnerabilities and your answer is to install it on a public wiki? I don't see how this is even remotely helpful.
The issue isn't finding people at Wikisource to try out a music module; the issue is that the extension has technical problems that need to be addressed. If people want to play around with LaTeX markup (music-related or not), there are surely a million existing venues on the Web. If, at some point in the future, after the vulnerabilities have largely been resolved, user interface/experience testing is needed, sure, setting up a demo on a labs wiki seems like a great idea. But I have no idea how you'd be at that point, at this point.
MZMcBride
MZMcBride z@mzmcbride.com writes:
What do you think?
Am I missing something? The extension has serious vulnerabilities and your answer is to install it on a public wiki? I don't see how this is even remotely helpful.
The issue isn't finding people at Wikisource to try out a music module; the issue is that the extension has technical problems that need to be addressed. If people want to play around with LaTeX markup (music-related or not), there are surely a million existing venues on the Web.
The issue is not to find people who want to try out the music module. You're right: if people just want to play with laying out music, then that is a solved problem.
Bug #189 is, as far as I can tell, a request to make machine-readable musical notation available on the WMF cluster. To do this, we need developers to get interested in Lilypond or ABC enough to help us deploy the extensions.
By providing a sandbox that is being actively used, I hope to get some developers interested -- to show a community that is using their work. Hopefully we could transfer that work to the WMF cluster after development on the extension is completed.
If you have a better idea for taking something that has set dead of almost 3 years and injecting new life into it -- something more than back-n-forth on the bug or this mailing list -- I'd love to hear it.
Mark.
Mark A. Hershberger wrote:
Bug #189 is, as far as I can tell, a request to make machine-readable musical notation available on the WMF cluster. To do this, we need developers to get interested in Lilypond or ABC enough to help us deploy the extensions.
By providing a sandbox that is being actively used, I hope to get some developers interested -- to show a community that is using their work. Hopefully we could transfer that work to the WMF cluster after development on the extension is completed.
If you have a better idea for taking something that has set dead of almost 3 years and injecting new life into it -- something more than back-n-forth on the bug or this mailing list -- I'd love to hear it.
Sure. I think we agree on the underlying issue: a lack of interest by outside groups (for free). But that doesn't always need to be met by a PR campaign (though perhaps making this a coding challenge is an avenue to explore). There's plenty of money in Wikimedia's operating budget to hire someone to write a solution to this problem. It simply may require eliminating some other positions (Storyteller, Bugmeister, etc.). I think the cost is worth the benefit. And for the price of one or two positions, you could reasonably get three or four big features per year.
Alternately, I think eliminating projects such as Wikisource from the Wikimedia umbrella would resolve (or largely mitigate) this issue. No Wikisource, mostly no problem. I don't think it's Wikipedia that's really clamoring for this ability, though it'd be nice there as well. More direct focus on Wikimedia's part would make underlying bugs easier to triage, surely.
One option that seems to be a complete non-starter to me is enabling possibly dangerous code, even on a test environment, when the dangers are known and unresolved. I still haven't seen anything to suggest this is a good idea. Once someone has worked on the code for a while and addressed CPU and memory issues, I think a labs wiki is a great idea. But before then? Makes no sense to me.
I also can't help but imagine, given only your past actions on Bugzilla, that you would take the interpretation ("a request to make machine-readable musical notation available on the WMF cluster") to mean that if such functionality became available on a labs wiki, the bug could be marked resolved. This is, of course, completely outlandish and ridiculous, but it's what I've come to expect.
MZMcBride
On English Wikisource, we have started marking pages needing sheet music transcription. There are currently 171 tagged.
http://en.wikisource.org/wiki/Category:Pages_containing_sheet_music
But there are thousands more that are not tagged like http://en.wikisource.org/wiki/Jana_Gana_Mana
There are at least 125 files with lilypond transcriptions on the page
http://commons.wikimedia.org/w/index.php?title=Special:WhatLinksHere/Templat... http://commons.wikimedia.org/wiki/Category:GNU_LilyPond_images
And as MZMcBride says, there are many repositories of lilypond and ABC sheet music online.
Should Wikisource hold a poll to prove we want this, and would kiss the feet of the developer who fixes the problems? :P ;-)
Does the lilypond '-safe' mode not resolve the security problems?
-- John Vandenberg
On Sun, Oct 23, 2011 at 5:34 PM, John Vandenberg jayvdb@gmail.com wrote:
Does the lilypond '-safe' mode not resolve the security problems?
According to the other thread, nope.
-- John
On 23/10/11 18:07, Nikola Smolenski wrote:
It is my understanding that there are three road blockers on this one:
- ABC vs Lilypond, and which exact implementation to use. At this point
I assume everyone is so sick of waiting that no one will care what is used so long as something is used.
Lilypond would provide more flexibility for editors. I originally thought that ABC would be better for security, but it turns out that it is full of buffer overflow vulnerabilities, so I'm now recommending Lilypond without reservations.
[...]
- And the big one, security. It has not been shown that any of the
proposed implementations is secure. I was thinking that perhaps a way to overcome this would be to have a dedicated system just for handling music rendering. It would work something like this:
a) A dedicated server used only for music rendering. The server runs several virtual machines with the music rendering software. It only accepts the notes and returns the images.
b) When the parent server receives a text with the notes, it only passes it to a free virtual machine. When it receives the images from the virtual machine, it passes them back to the client.
c) If it doesn't receive the images within a certain time, it shuts down the virtual machine, starts a new one and returns an error image to the client.
Is there a hole in this system that would make it possible to hack the parent server by means of a malicious file?
I think that would be overkill. All we really need is basic resource limiting (say ulimit.sh plus PoolCounter), and LilyPond should be run with --jail.
LilyPond has two secure modes: --safe and --jail. Running with --safe is easy to support but has restricted functionality, which will impact users. Running with --jail is more complex to set up, but allows commonly-used macros to be imported.
Ideally music rendering would be split out on to different servers, using internal api.php requests. This allows cluster-wide LilyPond resource usage to be limited by limiting the number of servers in the LilyPond rendering pool. It also simplifies the operations task by reducing the number of servers that run LilyPond, making both configuration and monitoring easier.
-- Tim Starling
Or we do without any servers, at the expense of older browsers: http://0xfe.blogspot.com/2010/05/music-notation-with-html5-canvas.html
Magnus
On 24/10/11 09:49, Magnus Manske wrote:
Or we do without any servers, at the expense of older browsers: http://0xfe.blogspot.com/2010/05/music-notation-with-html5-canvas.html
A problem with that is that it is not a notation standard. It can't be imported or edited into specialized notation programs, for example.
On Mon, Oct 24, 2011 at 9:22 AM, Nikola Smolenski smolensk@eunet.rs wrote:
On 24/10/11 09:49, Magnus Manske wrote:
Or we do without any servers, at the expense of older browsers: http://0xfe.blogspot.com/2010/05/music-notation-with-html5-canvas.html
A problem with that is that it is not a notation standard. It can't be imported or edited into specialized notation programs, for example.
It was just a general notion, not recommending a specific implementation or notation.
Abc notation rendering (untested): https://code.google.com/p/abcjs/
Magnus Manske <magnusmanske <at> googlemail.com> writes:
Or we do without any servers, at the expense of older browsers: http://0xfe.blogspot.com/2010/05/music-notation-with-html5-
canvas.html
After reading this thread, I was going to suggest this library as well. :) I have played around with it and I think it is quite nice for tablature, but it is probably not as good for traditional musicians? It's true that it has a non-standard notation but it is oriented towards editing as simple text, and it is entirely client side, which solves a lot of the other problems you have discussed.
The official site of that javascript library is http://vexflow.com/ (there are more examples and demos there). It uses an MIT license. As part of a Wikia hackathon last year I wrote a simple extension which adds one parser hook, loads that javascript library and renders notation from a wiki page. The downside is the "flash of unstyled text" effect that you get as the page loads, but it does work quite well and the extension is only ~100 lines of code, including the code for the editing box that I created so that you can do ajax/live edits. If anyone wants to check out a demo of that, I can send a link to my devbox...
Owen@wikia
On 24/10/11 18:23, Owen Davis wrote:
After reading this thread, I was going to suggest this library as well. :) I have played around with it and I think it is quite nice for tablature, but it is probably not as good for traditional musicians? It's true that it has a non-standard notation but it is oriented towards editing as simple text, and it is entirely client side, which solves a lot of the other problems you have discussed.
The official site of that javascript library is http://vexflow.com/ (there are more examples and demos there). It uses an MIT license. As part of a Wikia hackathon last year I wrote a simple extension which adds one parser hook, loads that javascript library and renders notation from a wiki page. The downside is the "flash of unstyled text" effect that you get as the page loads, but it does work quite well and the extension is only ~100 lines of code, including the code for the editing box that I created so that you can do ajax/live edits. If anyone wants to check out a demo of that, I can send a link to my devbox...
Owen@wikia
If vexflow is simple, capable and safe, could it also be converted to run server-side as an image renderer, using something like node-canvas? If so, then that's potentially a solution for everyone.
- Neil
Neil Harris <neil <at> tonal.clara.co.uk> writes:
If vexflow is simple, capable and safe, could it also be converted to run server-side as an image renderer, using something like node-
canvas?
If so, then that's potentially a solution for everyone.
- Neil
Oh, that's an interesting idea. I haven't tried but I would expect that it can run with node-canvas. Shouldn't take too much effort to test it out. I'll try to do that and report back.
Owen@wikia
wikitech-l@lists.wikimedia.org